Why PySpark execute only the default statement in my custom `SQLTransformer`












0















I wrote a custom SQLTransformer in PySpark. And setting a default SQL statement is mandatory to have the code being executed. I can save the custum transformer within Python, load it and execute it using Scala or/and Python but only the default statement is executed despite the fact that there is something else in the _transform method. I have the same result for both languages, then the problem is not related to _to_java method or JavaTransformer class.





class filter(SQLTransformer): 
def __init__(self):
super(filter, self).__init__()
self._setDefault(statement = "select text, label from __THIS__")

def _transform(self, df):
df = df.filter(df.id > 23)
return df









share|improve this question

























  • I need to call the SQLTransformer in a Scala pipeline. I can save the SQLTransformer within Python, load and run it in the Scala side, but despite the fact that I define a _transform method in the class, the default statement is executed in the Scala side.

    – Bentech
    Nov 13 '18 at 18:29
















0















I wrote a custom SQLTransformer in PySpark. And setting a default SQL statement is mandatory to have the code being executed. I can save the custum transformer within Python, load it and execute it using Scala or/and Python but only the default statement is executed despite the fact that there is something else in the _transform method. I have the same result for both languages, then the problem is not related to _to_java method or JavaTransformer class.





class filter(SQLTransformer): 
def __init__(self):
super(filter, self).__init__()
self._setDefault(statement = "select text, label from __THIS__")

def _transform(self, df):
df = df.filter(df.id > 23)
return df









share|improve this question

























  • I need to call the SQLTransformer in a Scala pipeline. I can save the SQLTransformer within Python, load and run it in the Scala side, but despite the fact that I define a _transform method in the class, the default statement is executed in the Scala side.

    – Bentech
    Nov 13 '18 at 18:29














0












0








0








I wrote a custom SQLTransformer in PySpark. And setting a default SQL statement is mandatory to have the code being executed. I can save the custum transformer within Python, load it and execute it using Scala or/and Python but only the default statement is executed despite the fact that there is something else in the _transform method. I have the same result for both languages, then the problem is not related to _to_java method or JavaTransformer class.





class filter(SQLTransformer): 
def __init__(self):
super(filter, self).__init__()
self._setDefault(statement = "select text, label from __THIS__")

def _transform(self, df):
df = df.filter(df.id > 23)
return df









share|improve this question
















I wrote a custom SQLTransformer in PySpark. And setting a default SQL statement is mandatory to have the code being executed. I can save the custum transformer within Python, load it and execute it using Scala or/and Python but only the default statement is executed despite the fact that there is something else in the _transform method. I have the same result for both languages, then the problem is not related to _to_java method or JavaTransformer class.





class filter(SQLTransformer): 
def __init__(self):
super(filter, self).__init__()
self._setDefault(statement = "select text, label from __THIS__")

def _transform(self, df):
df = df.filter(df.id > 23)
return df






apache-spark pyspark pipeline apache-spark-ml






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 13 '18 at 19:12







Bentech

















asked Nov 13 '18 at 13:04









BentechBentech

447




447













  • I need to call the SQLTransformer in a Scala pipeline. I can save the SQLTransformer within Python, load and run it in the Scala side, but despite the fact that I define a _transform method in the class, the default statement is executed in the Scala side.

    – Bentech
    Nov 13 '18 at 18:29



















  • I need to call the SQLTransformer in a Scala pipeline. I can save the SQLTransformer within Python, load and run it in the Scala side, but despite the fact that I define a _transform method in the class, the default statement is executed in the Scala side.

    – Bentech
    Nov 13 '18 at 18:29

















I need to call the SQLTransformer in a Scala pipeline. I can save the SQLTransformer within Python, load and run it in the Scala side, but despite the fact that I define a _transform method in the class, the default statement is executed in the Scala side.

– Bentech
Nov 13 '18 at 18:29





I need to call the SQLTransformer in a Scala pipeline. I can save the SQLTransformer within Python, load and run it in the Scala side, but despite the fact that I define a _transform method in the class, the default statement is executed in the Scala side.

– Bentech
Nov 13 '18 at 18:29












1 Answer
1






active

oldest

votes


















1














Such information flow is not supported. To create a Tranformer that can be used with both Python and Scala code base you have:




  • Implement Java or Scala Transformer, in your case extending org.apache.spark.ml.feature.SQLTransformer.

  • Add Python wrapper extending pyspark.sql.ml.wrapper.JavaTransformer the same way as pyspark.sql.ml.feature.SQLTransformer and interface JVM counterpart from it.






share|improve this answer



















  • 1





    Thanks, that's to say that custom Transformer written in Python can not be used in a Scala pipeline. Because, If I need to write the same code in Scala and in Python, better use directly what is already written in Scala in my Scala pipeline.

    – Bentech
    Nov 14 '18 at 15:36











Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53281646%2fwhy-pyspark-execute-only-the-default-statement-in-my-custom-sqltransformer%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









1














Such information flow is not supported. To create a Tranformer that can be used with both Python and Scala code base you have:




  • Implement Java or Scala Transformer, in your case extending org.apache.spark.ml.feature.SQLTransformer.

  • Add Python wrapper extending pyspark.sql.ml.wrapper.JavaTransformer the same way as pyspark.sql.ml.feature.SQLTransformer and interface JVM counterpart from it.






share|improve this answer



















  • 1





    Thanks, that's to say that custom Transformer written in Python can not be used in a Scala pipeline. Because, If I need to write the same code in Scala and in Python, better use directly what is already written in Scala in my Scala pipeline.

    – Bentech
    Nov 14 '18 at 15:36
















1














Such information flow is not supported. To create a Tranformer that can be used with both Python and Scala code base you have:




  • Implement Java or Scala Transformer, in your case extending org.apache.spark.ml.feature.SQLTransformer.

  • Add Python wrapper extending pyspark.sql.ml.wrapper.JavaTransformer the same way as pyspark.sql.ml.feature.SQLTransformer and interface JVM counterpart from it.






share|improve this answer



















  • 1





    Thanks, that's to say that custom Transformer written in Python can not be used in a Scala pipeline. Because, If I need to write the same code in Scala and in Python, better use directly what is already written in Scala in my Scala pipeline.

    – Bentech
    Nov 14 '18 at 15:36














1












1








1







Such information flow is not supported. To create a Tranformer that can be used with both Python and Scala code base you have:




  • Implement Java or Scala Transformer, in your case extending org.apache.spark.ml.feature.SQLTransformer.

  • Add Python wrapper extending pyspark.sql.ml.wrapper.JavaTransformer the same way as pyspark.sql.ml.feature.SQLTransformer and interface JVM counterpart from it.






share|improve this answer













Such information flow is not supported. To create a Tranformer that can be used with both Python and Scala code base you have:




  • Implement Java or Scala Transformer, in your case extending org.apache.spark.ml.feature.SQLTransformer.

  • Add Python wrapper extending pyspark.sql.ml.wrapper.JavaTransformer the same way as pyspark.sql.ml.feature.SQLTransformer and interface JVM counterpart from it.







share|improve this answer












share|improve this answer



share|improve this answer










answered Nov 13 '18 at 21:43







user10648740















  • 1





    Thanks, that's to say that custom Transformer written in Python can not be used in a Scala pipeline. Because, If I need to write the same code in Scala and in Python, better use directly what is already written in Scala in my Scala pipeline.

    – Bentech
    Nov 14 '18 at 15:36














  • 1





    Thanks, that's to say that custom Transformer written in Python can not be used in a Scala pipeline. Because, If I need to write the same code in Scala and in Python, better use directly what is already written in Scala in my Scala pipeline.

    – Bentech
    Nov 14 '18 at 15:36








1




1





Thanks, that's to say that custom Transformer written in Python can not be used in a Scala pipeline. Because, If I need to write the same code in Scala and in Python, better use directly what is already written in Scala in my Scala pipeline.

– Bentech
Nov 14 '18 at 15:36





Thanks, that's to say that custom Transformer written in Python can not be used in a Scala pipeline. Because, If I need to write the same code in Scala and in Python, better use directly what is already written in Scala in my Scala pipeline.

– Bentech
Nov 14 '18 at 15:36


















draft saved

draft discarded




















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53281646%2fwhy-pyspark-execute-only-the-default-statement-in-my-custom-sqltransformer%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

The Sandy Post

Danny Elfman

Pages that link to "Head v. Amoskeag Manufacturing Co."