Why PySpark execute only the default statement in my custom `SQLTransformer`
I wrote a custom SQLTransformer in PySpark. And setting a default SQL statement is mandatory to have the code being executed. I can save the custum transformer within Python, load it and execute it using Scala or/and Python but only the default statement is executed despite the fact that there is something else in the _transform method. I have the same result for both languages, then the problem is not related to _to_java method or JavaTransformer class.
class filter(SQLTransformer):
def __init__(self):
super(filter, self).__init__()
self._setDefault(statement = "select text, label from __THIS__")
def _transform(self, df):
df = df.filter(df.id > 23)
return df
apache-spark pyspark pipeline apache-spark-ml
add a comment |
I wrote a custom SQLTransformer in PySpark. And setting a default SQL statement is mandatory to have the code being executed. I can save the custum transformer within Python, load it and execute it using Scala or/and Python but only the default statement is executed despite the fact that there is something else in the _transform method. I have the same result for both languages, then the problem is not related to _to_java method or JavaTransformer class.
class filter(SQLTransformer):
def __init__(self):
super(filter, self).__init__()
self._setDefault(statement = "select text, label from __THIS__")
def _transform(self, df):
df = df.filter(df.id > 23)
return df
apache-spark pyspark pipeline apache-spark-ml
I need to call theSQLTransformerin a Scala pipeline. I can save theSQLTransformerwithin Python, load and run it in the Scala side, but despite the fact that I define a_transformmethod in the class, the default statement is executed in the Scala side.
– Bentech
Nov 13 '18 at 18:29
add a comment |
I wrote a custom SQLTransformer in PySpark. And setting a default SQL statement is mandatory to have the code being executed. I can save the custum transformer within Python, load it and execute it using Scala or/and Python but only the default statement is executed despite the fact that there is something else in the _transform method. I have the same result for both languages, then the problem is not related to _to_java method or JavaTransformer class.
class filter(SQLTransformer):
def __init__(self):
super(filter, self).__init__()
self._setDefault(statement = "select text, label from __THIS__")
def _transform(self, df):
df = df.filter(df.id > 23)
return df
apache-spark pyspark pipeline apache-spark-ml
I wrote a custom SQLTransformer in PySpark. And setting a default SQL statement is mandatory to have the code being executed. I can save the custum transformer within Python, load it and execute it using Scala or/and Python but only the default statement is executed despite the fact that there is something else in the _transform method. I have the same result for both languages, then the problem is not related to _to_java method or JavaTransformer class.
class filter(SQLTransformer):
def __init__(self):
super(filter, self).__init__()
self._setDefault(statement = "select text, label from __THIS__")
def _transform(self, df):
df = df.filter(df.id > 23)
return df
apache-spark pyspark pipeline apache-spark-ml
apache-spark pyspark pipeline apache-spark-ml
edited Nov 13 '18 at 19:12
Bentech
asked Nov 13 '18 at 13:04
BentechBentech
447
447
I need to call theSQLTransformerin a Scala pipeline. I can save theSQLTransformerwithin Python, load and run it in the Scala side, but despite the fact that I define a_transformmethod in the class, the default statement is executed in the Scala side.
– Bentech
Nov 13 '18 at 18:29
add a comment |
I need to call theSQLTransformerin a Scala pipeline. I can save theSQLTransformerwithin Python, load and run it in the Scala side, but despite the fact that I define a_transformmethod in the class, the default statement is executed in the Scala side.
– Bentech
Nov 13 '18 at 18:29
I need to call the
SQLTransformer in a Scala pipeline. I can save the SQLTransformer within Python, load and run it in the Scala side, but despite the fact that I define a _transform method in the class, the default statement is executed in the Scala side.– Bentech
Nov 13 '18 at 18:29
I need to call the
SQLTransformer in a Scala pipeline. I can save the SQLTransformer within Python, load and run it in the Scala side, but despite the fact that I define a _transform method in the class, the default statement is executed in the Scala side.– Bentech
Nov 13 '18 at 18:29
add a comment |
1 Answer
1
active
oldest
votes
Such information flow is not supported. To create a Tranformer that can be used with both Python and Scala code base you have:
- Implement Java or Scala
Transformer, in your case extendingorg.apache.spark.ml.feature.SQLTransformer. - Add Python wrapper extending
pyspark.sql.ml.wrapper.JavaTransformerthe same way aspyspark.sql.ml.feature.SQLTransformerand interface JVM counterpart from it.
1
Thanks, that's to say that custom Transformer written in Python can not be used in a Scala pipeline. Because, If I need to write the same code in Scala and in Python, better use directly what is already written in Scala in my Scala pipeline.
– Bentech
Nov 14 '18 at 15:36
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53281646%2fwhy-pyspark-execute-only-the-default-statement-in-my-custom-sqltransformer%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
Such information flow is not supported. To create a Tranformer that can be used with both Python and Scala code base you have:
- Implement Java or Scala
Transformer, in your case extendingorg.apache.spark.ml.feature.SQLTransformer. - Add Python wrapper extending
pyspark.sql.ml.wrapper.JavaTransformerthe same way aspyspark.sql.ml.feature.SQLTransformerand interface JVM counterpart from it.
1
Thanks, that's to say that custom Transformer written in Python can not be used in a Scala pipeline. Because, If I need to write the same code in Scala and in Python, better use directly what is already written in Scala in my Scala pipeline.
– Bentech
Nov 14 '18 at 15:36
add a comment |
Such information flow is not supported. To create a Tranformer that can be used with both Python and Scala code base you have:
- Implement Java or Scala
Transformer, in your case extendingorg.apache.spark.ml.feature.SQLTransformer. - Add Python wrapper extending
pyspark.sql.ml.wrapper.JavaTransformerthe same way aspyspark.sql.ml.feature.SQLTransformerand interface JVM counterpart from it.
1
Thanks, that's to say that custom Transformer written in Python can not be used in a Scala pipeline. Because, If I need to write the same code in Scala and in Python, better use directly what is already written in Scala in my Scala pipeline.
– Bentech
Nov 14 '18 at 15:36
add a comment |
Such information flow is not supported. To create a Tranformer that can be used with both Python and Scala code base you have:
- Implement Java or Scala
Transformer, in your case extendingorg.apache.spark.ml.feature.SQLTransformer. - Add Python wrapper extending
pyspark.sql.ml.wrapper.JavaTransformerthe same way aspyspark.sql.ml.feature.SQLTransformerand interface JVM counterpart from it.
Such information flow is not supported. To create a Tranformer that can be used with both Python and Scala code base you have:
- Implement Java or Scala
Transformer, in your case extendingorg.apache.spark.ml.feature.SQLTransformer. - Add Python wrapper extending
pyspark.sql.ml.wrapper.JavaTransformerthe same way aspyspark.sql.ml.feature.SQLTransformerand interface JVM counterpart from it.
answered Nov 13 '18 at 21:43
user10648740
1
Thanks, that's to say that custom Transformer written in Python can not be used in a Scala pipeline. Because, If I need to write the same code in Scala and in Python, better use directly what is already written in Scala in my Scala pipeline.
– Bentech
Nov 14 '18 at 15:36
add a comment |
1
Thanks, that's to say that custom Transformer written in Python can not be used in a Scala pipeline. Because, If I need to write the same code in Scala and in Python, better use directly what is already written in Scala in my Scala pipeline.
– Bentech
Nov 14 '18 at 15:36
1
1
Thanks, that's to say that custom Transformer written in Python can not be used in a Scala pipeline. Because, If I need to write the same code in Scala and in Python, better use directly what is already written in Scala in my Scala pipeline.
– Bentech
Nov 14 '18 at 15:36
Thanks, that's to say that custom Transformer written in Python can not be used in a Scala pipeline. Because, If I need to write the same code in Scala and in Python, better use directly what is already written in Scala in my Scala pipeline.
– Bentech
Nov 14 '18 at 15:36
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53281646%2fwhy-pyspark-execute-only-the-default-statement-in-my-custom-sqltransformer%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
I need to call the
SQLTransformerin a Scala pipeline. I can save theSQLTransformerwithin Python, load and run it in the Scala side, but despite the fact that I define a_transformmethod in the class, the default statement is executed in the Scala side.– Bentech
Nov 13 '18 at 18:29