Map[String,java.lang.Object] to DataFrame schema issue












3















I have to determine the schema from the values (not the keys) of a Map[String, Object].



Sample map:



val myMap = Map("k1" -> 1, "k2" -> "", "k3"->  new Timestamp(new Date().getTime), "k4" -> 2.0 )


Currently I have created a schema from the keys like below:



// I have created a schema using keys
val schema = StructType(myMap.keys.toSeq.map {
StructField(_, StringType) // StringType is wrong since Object in the Map can be of any datatype
}

// I have created a RDD like below
val rdd = sc.parallelize(Seq(Row.fromSeq(myMap.values.toSeq)))
val df = sc.createDataFrame(rdd,schema)


But now my problem is that the object can be a double or date or timestamp or anything. But I have created a schema using StringType as described above which is wrong.



Any ideas of creating a schema from Map values that are objects?










share|improve this question

























  • @shaido : any ideas ?

    – user3190018
    Nov 6 '18 at 3:04











  • @ramesh-maharjan : infact I followed one of the post by you which is related to this question. which was working for normal types. but in this case any suggestions?

    – user3190018
    Nov 6 '18 at 6:07


















3















I have to determine the schema from the values (not the keys) of a Map[String, Object].



Sample map:



val myMap = Map("k1" -> 1, "k2" -> "", "k3"->  new Timestamp(new Date().getTime), "k4" -> 2.0 )


Currently I have created a schema from the keys like below:



// I have created a schema using keys
val schema = StructType(myMap.keys.toSeq.map {
StructField(_, StringType) // StringType is wrong since Object in the Map can be of any datatype
}

// I have created a RDD like below
val rdd = sc.parallelize(Seq(Row.fromSeq(myMap.values.toSeq)))
val df = sc.createDataFrame(rdd,schema)


But now my problem is that the object can be a double or date or timestamp or anything. But I have created a schema using StringType as described above which is wrong.



Any ideas of creating a schema from Map values that are objects?










share|improve this question

























  • @shaido : any ideas ?

    – user3190018
    Nov 6 '18 at 3:04











  • @ramesh-maharjan : infact I followed one of the post by you which is related to this question. which was working for normal types. but in this case any suggestions?

    – user3190018
    Nov 6 '18 at 6:07
















3












3








3


1






I have to determine the schema from the values (not the keys) of a Map[String, Object].



Sample map:



val myMap = Map("k1" -> 1, "k2" -> "", "k3"->  new Timestamp(new Date().getTime), "k4" -> 2.0 )


Currently I have created a schema from the keys like below:



// I have created a schema using keys
val schema = StructType(myMap.keys.toSeq.map {
StructField(_, StringType) // StringType is wrong since Object in the Map can be of any datatype
}

// I have created a RDD like below
val rdd = sc.parallelize(Seq(Row.fromSeq(myMap.values.toSeq)))
val df = sc.createDataFrame(rdd,schema)


But now my problem is that the object can be a double or date or timestamp or anything. But I have created a schema using StringType as described above which is wrong.



Any ideas of creating a schema from Map values that are objects?










share|improve this question
















I have to determine the schema from the values (not the keys) of a Map[String, Object].



Sample map:



val myMap = Map("k1" -> 1, "k2" -> "", "k3"->  new Timestamp(new Date().getTime), "k4" -> 2.0 )


Currently I have created a schema from the keys like below:



// I have created a schema using keys
val schema = StructType(myMap.keys.toSeq.map {
StructField(_, StringType) // StringType is wrong since Object in the Map can be of any datatype
}

// I have created a RDD like below
val rdd = sc.parallelize(Seq(Row.fromSeq(myMap.values.toSeq)))
val df = sc.createDataFrame(rdd,schema)


But now my problem is that the object can be a double or date or timestamp or anything. But I have created a schema using StringType as described above which is wrong.



Any ideas of creating a schema from Map values that are objects?







scala apache-spark dataframe apache-spark-sql schema






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 6 '18 at 1:42









Shaido

12.6k122742




12.6k122742










asked Nov 5 '18 at 23:00









user3190018user3190018

381415




381415













  • @shaido : any ideas ?

    – user3190018
    Nov 6 '18 at 3:04











  • @ramesh-maharjan : infact I followed one of the post by you which is related to this question. which was working for normal types. but in this case any suggestions?

    – user3190018
    Nov 6 '18 at 6:07





















  • @shaido : any ideas ?

    – user3190018
    Nov 6 '18 at 3:04











  • @ramesh-maharjan : infact I followed one of the post by you which is related to this question. which was working for normal types. but in this case any suggestions?

    – user3190018
    Nov 6 '18 at 6:07



















@shaido : any ideas ?

– user3190018
Nov 6 '18 at 3:04





@shaido : any ideas ?

– user3190018
Nov 6 '18 at 3:04













@ramesh-maharjan : infact I followed one of the post by you which is related to this question. which was working for normal types. but in this case any suggestions?

– user3190018
Nov 6 '18 at 6:07







@ramesh-maharjan : infact I followed one of the post by you which is related to this question. which was working for normal types. but in this case any suggestions?

– user3190018
Nov 6 '18 at 6:07














1 Answer
1






active

oldest

votes


















2














References : It is an idea from dataTypeFor of ScalaReflection from spark code



You can create struct like this



import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.types.{StringType, StructField, StructType}

/**
*createStruct based on datatype
* @param myObject Object
* @return [[DataType]]
*/
def createStruct(myObject: Object): DataType = {

myObject match {
case t if t.isInstanceOf[String] => StringType
case t if t.isInstanceOf[Long] => LongType
case t if t.isInstanceOf[Integer] => IntegerType
case t if t.isInstanceOf[Float] => FloatType
case t if t.isInstanceOf[Double] => DoubleType
case t if t.isInstanceOf[java.sql.Timestamp] => TimestampType
}
}


Below is the sample snippet which calls the function above..



val a: Seq[(Object, Object)] = myMap.keys.toList.zip(columnsMap.values.toList)
logger.info("" + a.toString)

val list = ListBuffer.empty[StructField]

a.foreach { x => {
list += StructField(x._1.toString, createStruct(x._2), false)
//println(createStruct(x._2) + "--" + x.toString())
}
// )
}
println("list is " + list)
val schema = StructType(list.toList)
println("-----" + schema.treeString)
val df = sparkSession.sqlContext.createDataFrame(rdd, schema)
df.printSchema()
df.show





share|improve this answer

























    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53163524%2fmapstring-java-lang-object-to-dataframe-schema-issue%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    2














    References : It is an idea from dataTypeFor of ScalaReflection from spark code



    You can create struct like this



    import org.apache.spark.sql.SparkSession
    import org.apache.spark.sql.types.{StringType, StructField, StructType}

    /**
    *createStruct based on datatype
    * @param myObject Object
    * @return [[DataType]]
    */
    def createStruct(myObject: Object): DataType = {

    myObject match {
    case t if t.isInstanceOf[String] => StringType
    case t if t.isInstanceOf[Long] => LongType
    case t if t.isInstanceOf[Integer] => IntegerType
    case t if t.isInstanceOf[Float] => FloatType
    case t if t.isInstanceOf[Double] => DoubleType
    case t if t.isInstanceOf[java.sql.Timestamp] => TimestampType
    }
    }


    Below is the sample snippet which calls the function above..



    val a: Seq[(Object, Object)] = myMap.keys.toList.zip(columnsMap.values.toList)
    logger.info("" + a.toString)

    val list = ListBuffer.empty[StructField]

    a.foreach { x => {
    list += StructField(x._1.toString, createStruct(x._2), false)
    //println(createStruct(x._2) + "--" + x.toString())
    }
    // )
    }
    println("list is " + list)
    val schema = StructType(list.toList)
    println("-----" + schema.treeString)
    val df = sparkSession.sqlContext.createDataFrame(rdd, schema)
    df.printSchema()
    df.show





    share|improve this answer






























      2














      References : It is an idea from dataTypeFor of ScalaReflection from spark code



      You can create struct like this



      import org.apache.spark.sql.SparkSession
      import org.apache.spark.sql.types.{StringType, StructField, StructType}

      /**
      *createStruct based on datatype
      * @param myObject Object
      * @return [[DataType]]
      */
      def createStruct(myObject: Object): DataType = {

      myObject match {
      case t if t.isInstanceOf[String] => StringType
      case t if t.isInstanceOf[Long] => LongType
      case t if t.isInstanceOf[Integer] => IntegerType
      case t if t.isInstanceOf[Float] => FloatType
      case t if t.isInstanceOf[Double] => DoubleType
      case t if t.isInstanceOf[java.sql.Timestamp] => TimestampType
      }
      }


      Below is the sample snippet which calls the function above..



      val a: Seq[(Object, Object)] = myMap.keys.toList.zip(columnsMap.values.toList)
      logger.info("" + a.toString)

      val list = ListBuffer.empty[StructField]

      a.foreach { x => {
      list += StructField(x._1.toString, createStruct(x._2), false)
      //println(createStruct(x._2) + "--" + x.toString())
      }
      // )
      }
      println("list is " + list)
      val schema = StructType(list.toList)
      println("-----" + schema.treeString)
      val df = sparkSession.sqlContext.createDataFrame(rdd, schema)
      df.printSchema()
      df.show





      share|improve this answer




























        2












        2








        2







        References : It is an idea from dataTypeFor of ScalaReflection from spark code



        You can create struct like this



        import org.apache.spark.sql.SparkSession
        import org.apache.spark.sql.types.{StringType, StructField, StructType}

        /**
        *createStruct based on datatype
        * @param myObject Object
        * @return [[DataType]]
        */
        def createStruct(myObject: Object): DataType = {

        myObject match {
        case t if t.isInstanceOf[String] => StringType
        case t if t.isInstanceOf[Long] => LongType
        case t if t.isInstanceOf[Integer] => IntegerType
        case t if t.isInstanceOf[Float] => FloatType
        case t if t.isInstanceOf[Double] => DoubleType
        case t if t.isInstanceOf[java.sql.Timestamp] => TimestampType
        }
        }


        Below is the sample snippet which calls the function above..



        val a: Seq[(Object, Object)] = myMap.keys.toList.zip(columnsMap.values.toList)
        logger.info("" + a.toString)

        val list = ListBuffer.empty[StructField]

        a.foreach { x => {
        list += StructField(x._1.toString, createStruct(x._2), false)
        //println(createStruct(x._2) + "--" + x.toString())
        }
        // )
        }
        println("list is " + list)
        val schema = StructType(list.toList)
        println("-----" + schema.treeString)
        val df = sparkSession.sqlContext.createDataFrame(rdd, schema)
        df.printSchema()
        df.show





        share|improve this answer















        References : It is an idea from dataTypeFor of ScalaReflection from spark code



        You can create struct like this



        import org.apache.spark.sql.SparkSession
        import org.apache.spark.sql.types.{StringType, StructField, StructType}

        /**
        *createStruct based on datatype
        * @param myObject Object
        * @return [[DataType]]
        */
        def createStruct(myObject: Object): DataType = {

        myObject match {
        case t if t.isInstanceOf[String] => StringType
        case t if t.isInstanceOf[Long] => LongType
        case t if t.isInstanceOf[Integer] => IntegerType
        case t if t.isInstanceOf[Float] => FloatType
        case t if t.isInstanceOf[Double] => DoubleType
        case t if t.isInstanceOf[java.sql.Timestamp] => TimestampType
        }
        }


        Below is the sample snippet which calls the function above..



        val a: Seq[(Object, Object)] = myMap.keys.toList.zip(columnsMap.values.toList)
        logger.info("" + a.toString)

        val list = ListBuffer.empty[StructField]

        a.foreach { x => {
        list += StructField(x._1.toString, createStruct(x._2), false)
        //println(createStruct(x._2) + "--" + x.toString())
        }
        // )
        }
        println("list is " + list)
        val schema = StructType(list.toList)
        println("-----" + schema.treeString)
        val df = sparkSession.sqlContext.createDataFrame(rdd, schema)
        df.printSchema()
        df.show






        share|improve this answer














        share|improve this answer



        share|improve this answer








        edited Dec 27 '18 at 18:08









        user3190018

        381415




        381415










        answered Nov 8 '18 at 18:19









        Ram GhadiyaramRam Ghadiyaram

        16.8k64477




        16.8k64477
































            draft saved

            draft discarded




















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53163524%2fmapstring-java-lang-object-to-dataframe-schema-issue%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            The Sandy Post

            Danny Elfman

            Pages that link to "Head v. Amoskeag Manufacturing Co."