Spark Equivalent of IF Then ELSE












10















I have seen this question earlier here and I have took lessons from that. However I am not sure why I am getting an error when I feel it should work.



I want to create a new column in existing Spark DataFrame by some rules. Here is what I wrote. iris_spark is the data frame with a categorical variable iris_spark with three distinct categories.



from pyspark.sql import functions as F

iris_spark_df = iris_spark.withColumn(
"Class",
F.when(iris_spark.iris_class == 'Iris-setosa', 0, F.when(iris_spark.iris_class == 'Iris-versicolor',1)).otherwise(2))


Throws the following error.



---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-157-21818c7dc060> in <module>()
----> 1 iris_spark_df=iris_spark.withColumn("Class",F.when(iris_spark.iris_class=='Iris-setosa',0,F.when(iris_spark.iris_class=='Iris-versicolor',1)))

TypeError: when() takes exactly 2 arguments (3 given)


---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-157-21818c7dc060> in <module>()
----> 1 iris_spark_df=iris_spark.withColumn("Class",F.when(iris_spark.iris_class=='Iris-setosa',0,F.when(iris_spark.iris_class=='Iris-versicolor',1)))

TypeError: when() takes exactly 2 arguments (3 given)


Any idea why?










share|improve this question





























    10















    I have seen this question earlier here and I have took lessons from that. However I am not sure why I am getting an error when I feel it should work.



    I want to create a new column in existing Spark DataFrame by some rules. Here is what I wrote. iris_spark is the data frame with a categorical variable iris_spark with three distinct categories.



    from pyspark.sql import functions as F

    iris_spark_df = iris_spark.withColumn(
    "Class",
    F.when(iris_spark.iris_class == 'Iris-setosa', 0, F.when(iris_spark.iris_class == 'Iris-versicolor',1)).otherwise(2))


    Throws the following error.



    ---------------------------------------------------------------------------
    TypeError Traceback (most recent call last)
    <ipython-input-157-21818c7dc060> in <module>()
    ----> 1 iris_spark_df=iris_spark.withColumn("Class",F.when(iris_spark.iris_class=='Iris-setosa',0,F.when(iris_spark.iris_class=='Iris-versicolor',1)))

    TypeError: when() takes exactly 2 arguments (3 given)


    ---------------------------------------------------------------------------
    TypeError Traceback (most recent call last)
    <ipython-input-157-21818c7dc060> in <module>()
    ----> 1 iris_spark_df=iris_spark.withColumn("Class",F.when(iris_spark.iris_class=='Iris-setosa',0,F.when(iris_spark.iris_class=='Iris-versicolor',1)))

    TypeError: when() takes exactly 2 arguments (3 given)


    Any idea why?










    share|improve this question



























      10












      10








      10


      6






      I have seen this question earlier here and I have took lessons from that. However I am not sure why I am getting an error when I feel it should work.



      I want to create a new column in existing Spark DataFrame by some rules. Here is what I wrote. iris_spark is the data frame with a categorical variable iris_spark with three distinct categories.



      from pyspark.sql import functions as F

      iris_spark_df = iris_spark.withColumn(
      "Class",
      F.when(iris_spark.iris_class == 'Iris-setosa', 0, F.when(iris_spark.iris_class == 'Iris-versicolor',1)).otherwise(2))


      Throws the following error.



      ---------------------------------------------------------------------------
      TypeError Traceback (most recent call last)
      <ipython-input-157-21818c7dc060> in <module>()
      ----> 1 iris_spark_df=iris_spark.withColumn("Class",F.when(iris_spark.iris_class=='Iris-setosa',0,F.when(iris_spark.iris_class=='Iris-versicolor',1)))

      TypeError: when() takes exactly 2 arguments (3 given)


      ---------------------------------------------------------------------------
      TypeError Traceback (most recent call last)
      <ipython-input-157-21818c7dc060> in <module>()
      ----> 1 iris_spark_df=iris_spark.withColumn("Class",F.when(iris_spark.iris_class=='Iris-setosa',0,F.when(iris_spark.iris_class=='Iris-versicolor',1)))

      TypeError: when() takes exactly 2 arguments (3 given)


      Any idea why?










      share|improve this question
















      I have seen this question earlier here and I have took lessons from that. However I am not sure why I am getting an error when I feel it should work.



      I want to create a new column in existing Spark DataFrame by some rules. Here is what I wrote. iris_spark is the data frame with a categorical variable iris_spark with three distinct categories.



      from pyspark.sql import functions as F

      iris_spark_df = iris_spark.withColumn(
      "Class",
      F.when(iris_spark.iris_class == 'Iris-setosa', 0, F.when(iris_spark.iris_class == 'Iris-versicolor',1)).otherwise(2))


      Throws the following error.



      ---------------------------------------------------------------------------
      TypeError Traceback (most recent call last)
      <ipython-input-157-21818c7dc060> in <module>()
      ----> 1 iris_spark_df=iris_spark.withColumn("Class",F.when(iris_spark.iris_class=='Iris-setosa',0,F.when(iris_spark.iris_class=='Iris-versicolor',1)))

      TypeError: when() takes exactly 2 arguments (3 given)


      ---------------------------------------------------------------------------
      TypeError Traceback (most recent call last)
      <ipython-input-157-21818c7dc060> in <module>()
      ----> 1 iris_spark_df=iris_spark.withColumn("Class",F.when(iris_spark.iris_class=='Iris-setosa',0,F.when(iris_spark.iris_class=='Iris-versicolor',1)))

      TypeError: when() takes exactly 2 arguments (3 given)


      Any idea why?







      python apache-spark pyspark apache-spark-sql






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Dec 10 '17 at 1:43









      Community

      11




      11










      asked Aug 19 '16 at 21:59









      BaktaawarBaktaawar

      1,73593065




      1,73593065
























          2 Answers
          2






          active

          oldest

          votes


















          27














          Correct structure is either:



          (when(col("iris_class") == 'Iris-setosa', 0)
          .when(col("iris_class") == 'Iris-versicolor', 1)
          .otherwise(2))


          which is equivalent to



          CASE 
          WHEN (iris_class = 'Iris-setosa') THEN 0
          WHEN (iris_class = 'Iris-versicolor') THEN 1
          ELSE 2
          END


          or:



          (when(col("iris_class") == 'Iris-setosa', 0)
          .otherwise(when(col("iris_class") == 'Iris-versicolor', 1)
          .otherwise(2)))


          which is equivalent to:



          CASE WHEN (iris_class = 'Iris-setosa') THEN 0 
          ELSE CASE WHEN (iris_class = 'Iris-versicolor') THEN 1
          ELSE 2
          END
          END


          with general syntax:



          when(condition, value).when(...)


          or



          when(condition, value).otherwise(...)


          You probably mixed up things with Hive IF conditional:



          IF(condition, if-true, if-false)


          which can be used only in raw SQL with Hive support.






          share|improve this answer

































            1














            There are different ways you can achieve if-then-else.




            1. Using when function in DataFrame API.
              You can specify the list of conditions in when and also can specify otherwise what value you need. You can use this expression in nested form as well.


            2. expr function.
              Using "expr" function you can pass SQL expression in expr. PFB example. Here we are creating new column "quarter" based on month column.




            cond = """case when month > 9 then 'Q4'
            else case when month > 6 then 'Q3'
            else case when month > 3 then 'Q2'
            else case when month > 0 then 'Q1'
            end
            end
            end
            end as quarter"""

            newdf = df.withColumn("quarter", expr(cond))





            1. selectExpr function.
              We can also use the variant of select function which can take SQL expression. PFB example.



            cond = """case when month > 9 then 'Q4'
            else case when month > 6 then 'Q3'
            else case when month > 3 then 'Q2'
            else case when month > 0 then 'Q1'
            end
            end
            end
            end as quarter"""

            newdf = df.selectExpr("*", cond)



            Hope this helps.



            Regards,



            Neeraj






            share|improve this answer























              Your Answer






              StackExchange.ifUsing("editor", function () {
              StackExchange.using("externalEditor", function () {
              StackExchange.using("snippets", function () {
              StackExchange.snippets.init();
              });
              });
              }, "code-snippets");

              StackExchange.ready(function() {
              var channelOptions = {
              tags: "".split(" "),
              id: "1"
              };
              initTagRenderer("".split(" "), "".split(" "), channelOptions);

              StackExchange.using("externalEditor", function() {
              // Have to fire editor after snippets, if snippets enabled
              if (StackExchange.settings.snippets.snippetsEnabled) {
              StackExchange.using("snippets", function() {
              createEditor();
              });
              }
              else {
              createEditor();
              }
              });

              function createEditor() {
              StackExchange.prepareEditor({
              heartbeatType: 'answer',
              autoActivateHeartbeat: false,
              convertImagesToLinks: true,
              noModals: true,
              showLowRepImageUploadWarning: true,
              reputationToPostImages: 10,
              bindNavPrevention: true,
              postfix: "",
              imageUploader: {
              brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
              contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
              allowUrls: true
              },
              onDemand: true,
              discardSelector: ".discard-answer"
              ,immediatelyShowMarkdownHelp:true
              });


              }
              });














              draft saved

              draft discarded


















              StackExchange.ready(
              function () {
              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f39048229%2fspark-equivalent-of-if-then-else%23new-answer', 'question_page');
              }
              );

              Post as a guest















              Required, but never shown

























              2 Answers
              2






              active

              oldest

              votes








              2 Answers
              2






              active

              oldest

              votes









              active

              oldest

              votes






              active

              oldest

              votes









              27














              Correct structure is either:



              (when(col("iris_class") == 'Iris-setosa', 0)
              .when(col("iris_class") == 'Iris-versicolor', 1)
              .otherwise(2))


              which is equivalent to



              CASE 
              WHEN (iris_class = 'Iris-setosa') THEN 0
              WHEN (iris_class = 'Iris-versicolor') THEN 1
              ELSE 2
              END


              or:



              (when(col("iris_class") == 'Iris-setosa', 0)
              .otherwise(when(col("iris_class") == 'Iris-versicolor', 1)
              .otherwise(2)))


              which is equivalent to:



              CASE WHEN (iris_class = 'Iris-setosa') THEN 0 
              ELSE CASE WHEN (iris_class = 'Iris-versicolor') THEN 1
              ELSE 2
              END
              END


              with general syntax:



              when(condition, value).when(...)


              or



              when(condition, value).otherwise(...)


              You probably mixed up things with Hive IF conditional:



              IF(condition, if-true, if-false)


              which can be used only in raw SQL with Hive support.






              share|improve this answer






























                27














                Correct structure is either:



                (when(col("iris_class") == 'Iris-setosa', 0)
                .when(col("iris_class") == 'Iris-versicolor', 1)
                .otherwise(2))


                which is equivalent to



                CASE 
                WHEN (iris_class = 'Iris-setosa') THEN 0
                WHEN (iris_class = 'Iris-versicolor') THEN 1
                ELSE 2
                END


                or:



                (when(col("iris_class") == 'Iris-setosa', 0)
                .otherwise(when(col("iris_class") == 'Iris-versicolor', 1)
                .otherwise(2)))


                which is equivalent to:



                CASE WHEN (iris_class = 'Iris-setosa') THEN 0 
                ELSE CASE WHEN (iris_class = 'Iris-versicolor') THEN 1
                ELSE 2
                END
                END


                with general syntax:



                when(condition, value).when(...)


                or



                when(condition, value).otherwise(...)


                You probably mixed up things with Hive IF conditional:



                IF(condition, if-true, if-false)


                which can be used only in raw SQL with Hive support.






                share|improve this answer




























                  27












                  27








                  27







                  Correct structure is either:



                  (when(col("iris_class") == 'Iris-setosa', 0)
                  .when(col("iris_class") == 'Iris-versicolor', 1)
                  .otherwise(2))


                  which is equivalent to



                  CASE 
                  WHEN (iris_class = 'Iris-setosa') THEN 0
                  WHEN (iris_class = 'Iris-versicolor') THEN 1
                  ELSE 2
                  END


                  or:



                  (when(col("iris_class") == 'Iris-setosa', 0)
                  .otherwise(when(col("iris_class") == 'Iris-versicolor', 1)
                  .otherwise(2)))


                  which is equivalent to:



                  CASE WHEN (iris_class = 'Iris-setosa') THEN 0 
                  ELSE CASE WHEN (iris_class = 'Iris-versicolor') THEN 1
                  ELSE 2
                  END
                  END


                  with general syntax:



                  when(condition, value).when(...)


                  or



                  when(condition, value).otherwise(...)


                  You probably mixed up things with Hive IF conditional:



                  IF(condition, if-true, if-false)


                  which can be used only in raw SQL with Hive support.






                  share|improve this answer















                  Correct structure is either:



                  (when(col("iris_class") == 'Iris-setosa', 0)
                  .when(col("iris_class") == 'Iris-versicolor', 1)
                  .otherwise(2))


                  which is equivalent to



                  CASE 
                  WHEN (iris_class = 'Iris-setosa') THEN 0
                  WHEN (iris_class = 'Iris-versicolor') THEN 1
                  ELSE 2
                  END


                  or:



                  (when(col("iris_class") == 'Iris-setosa', 0)
                  .otherwise(when(col("iris_class") == 'Iris-versicolor', 1)
                  .otherwise(2)))


                  which is equivalent to:



                  CASE WHEN (iris_class = 'Iris-setosa') THEN 0 
                  ELSE CASE WHEN (iris_class = 'Iris-versicolor') THEN 1
                  ELSE 2
                  END
                  END


                  with general syntax:



                  when(condition, value).when(...)


                  or



                  when(condition, value).otherwise(...)


                  You probably mixed up things with Hive IF conditional:



                  IF(condition, if-true, if-false)


                  which can be used only in raw SQL with Hive support.







                  share|improve this answer














                  share|improve this answer



                  share|improve this answer








                  edited Mar 1 '17 at 23:54

























                  answered Aug 19 '16 at 22:26









                  zero323zero323

                  167k40485577




                  167k40485577

























                      1














                      There are different ways you can achieve if-then-else.




                      1. Using when function in DataFrame API.
                        You can specify the list of conditions in when and also can specify otherwise what value you need. You can use this expression in nested form as well.


                      2. expr function.
                        Using "expr" function you can pass SQL expression in expr. PFB example. Here we are creating new column "quarter" based on month column.




                      cond = """case when month > 9 then 'Q4'
                      else case when month > 6 then 'Q3'
                      else case when month > 3 then 'Q2'
                      else case when month > 0 then 'Q1'
                      end
                      end
                      end
                      end as quarter"""

                      newdf = df.withColumn("quarter", expr(cond))





                      1. selectExpr function.
                        We can also use the variant of select function which can take SQL expression. PFB example.



                      cond = """case when month > 9 then 'Q4'
                      else case when month > 6 then 'Q3'
                      else case when month > 3 then 'Q2'
                      else case when month > 0 then 'Q1'
                      end
                      end
                      end
                      end as quarter"""

                      newdf = df.selectExpr("*", cond)



                      Hope this helps.



                      Regards,



                      Neeraj






                      share|improve this answer




























                        1














                        There are different ways you can achieve if-then-else.




                        1. Using when function in DataFrame API.
                          You can specify the list of conditions in when and also can specify otherwise what value you need. You can use this expression in nested form as well.


                        2. expr function.
                          Using "expr" function you can pass SQL expression in expr. PFB example. Here we are creating new column "quarter" based on month column.




                        cond = """case when month > 9 then 'Q4'
                        else case when month > 6 then 'Q3'
                        else case when month > 3 then 'Q2'
                        else case when month > 0 then 'Q1'
                        end
                        end
                        end
                        end as quarter"""

                        newdf = df.withColumn("quarter", expr(cond))





                        1. selectExpr function.
                          We can also use the variant of select function which can take SQL expression. PFB example.



                        cond = """case when month > 9 then 'Q4'
                        else case when month > 6 then 'Q3'
                        else case when month > 3 then 'Q2'
                        else case when month > 0 then 'Q1'
                        end
                        end
                        end
                        end as quarter"""

                        newdf = df.selectExpr("*", cond)



                        Hope this helps.



                        Regards,



                        Neeraj






                        share|improve this answer


























                          1












                          1








                          1







                          There are different ways you can achieve if-then-else.




                          1. Using when function in DataFrame API.
                            You can specify the list of conditions in when and also can specify otherwise what value you need. You can use this expression in nested form as well.


                          2. expr function.
                            Using "expr" function you can pass SQL expression in expr. PFB example. Here we are creating new column "quarter" based on month column.




                          cond = """case when month > 9 then 'Q4'
                          else case when month > 6 then 'Q3'
                          else case when month > 3 then 'Q2'
                          else case when month > 0 then 'Q1'
                          end
                          end
                          end
                          end as quarter"""

                          newdf = df.withColumn("quarter", expr(cond))





                          1. selectExpr function.
                            We can also use the variant of select function which can take SQL expression. PFB example.



                          cond = """case when month > 9 then 'Q4'
                          else case when month > 6 then 'Q3'
                          else case when month > 3 then 'Q2'
                          else case when month > 0 then 'Q1'
                          end
                          end
                          end
                          end as quarter"""

                          newdf = df.selectExpr("*", cond)



                          Hope this helps.



                          Regards,



                          Neeraj






                          share|improve this answer













                          There are different ways you can achieve if-then-else.




                          1. Using when function in DataFrame API.
                            You can specify the list of conditions in when and also can specify otherwise what value you need. You can use this expression in nested form as well.


                          2. expr function.
                            Using "expr" function you can pass SQL expression in expr. PFB example. Here we are creating new column "quarter" based on month column.




                          cond = """case when month > 9 then 'Q4'
                          else case when month > 6 then 'Q3'
                          else case when month > 3 then 'Q2'
                          else case when month > 0 then 'Q1'
                          end
                          end
                          end
                          end as quarter"""

                          newdf = df.withColumn("quarter", expr(cond))





                          1. selectExpr function.
                            We can also use the variant of select function which can take SQL expression. PFB example.



                          cond = """case when month > 9 then 'Q4'
                          else case when month > 6 then 'Q3'
                          else case when month > 3 then 'Q2'
                          else case when month > 0 then 'Q1'
                          end
                          end
                          end
                          end as quarter"""

                          newdf = df.selectExpr("*", cond)



                          Hope this helps.



                          Regards,



                          Neeraj







                          share|improve this answer












                          share|improve this answer



                          share|improve this answer










                          answered Dec 17 '18 at 16:52









                          neeraj bhadanineeraj bhadani

                          827212




                          827212






























                              draft saved

                              draft discarded




















































                              Thanks for contributing an answer to Stack Overflow!


                              • Please be sure to answer the question. Provide details and share your research!

                              But avoid



                              • Asking for help, clarification, or responding to other answers.

                              • Making statements based on opinion; back them up with references or personal experience.


                              To learn more, see our tips on writing great answers.




                              draft saved


                              draft discarded














                              StackExchange.ready(
                              function () {
                              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f39048229%2fspark-equivalent-of-if-then-else%23new-answer', 'question_page');
                              }
                              );

                              Post as a guest















                              Required, but never shown





















































                              Required, but never shown














                              Required, but never shown












                              Required, but never shown







                              Required, but never shown

































                              Required, but never shown














                              Required, but never shown












                              Required, but never shown







                              Required, but never shown







                              Popular posts from this blog

                              Florida Star v. B. J. F.

                              Danny Elfman

                              Lugert, Oklahoma