Why PySpark execute only the default statement in my custom `SQLTransformer`

I wrote a custom SQLTransformer in PySpark. And setting a default SQL statement is mandatory to have the code being executed. I can save the custum transformer within Python, load it and execute it using Scala or/and Python but only the default statement is executed despite the fact that there is something else in the _transform method. I have the same result for both languages, then the problem is not related to _to_java method or JavaTransformer class.

class filter(SQLTransformer): 

    def __init__(self):

        super(filter, self).__init__() 

        self._setDefault(statement = "select text, label from __THIS__") 



    def _transform(self, df): 

        df = df.filter(df.id > 23)

        return df

edited Nov 13 '18 at 19:12

asked Nov 13 '18 at 13:04

Bentech

447

I need to call the SQLTransformer in a Scala pipeline. I can save the SQLTransformer within Python, load and run it in the Scala side, but despite the fact that I define a _transform method in the class, the default statement is executed in the Scala side.

– Bentech
Nov 13 '18 at 18:29

add a comment |

class filter(SQLTransformer): 

    def __init__(self):

        super(filter, self).__init__() 

        self._setDefault(statement = "select text, label from __THIS__") 



    def _transform(self, df): 

        df = df.filter(df.id > 23)

        return df

edited Nov 13 '18 at 19:12

asked Nov 13 '18 at 13:04

Bentech

447

I need to call the SQLTransformer in a Scala pipeline. I can save the SQLTransformer within Python, load and run it in the Scala side, but despite the fact that I define a _transform method in the class, the default statement is executed in the Scala side.

– Bentech
Nov 13 '18 at 18:29

add a comment |

class filter(SQLTransformer): 

    def __init__(self):

        super(filter, self).__init__() 

        self._setDefault(statement = "select text, label from __THIS__") 



    def _transform(self, df): 

        df = df.filter(df.id > 23)

        return df

edited Nov 13 '18 at 19:12

asked Nov 13 '18 at 13:04

Bentech

447

class filter(SQLTransformer): 

    def __init__(self):

        super(filter, self).__init__() 

        self._setDefault(statement = "select text, label from __THIS__") 



    def _transform(self, df): 

        df = df.filter(df.id > 23)

        return df

apache-spark pyspark pipeline apache-spark-ml

edited Nov 13 '18 at 19:12

asked Nov 13 '18 at 13:04

Bentech

447

edited Nov 13 '18 at 19:12

asked Nov 13 '18 at 13:04

Bentech

447

edited Nov 13 '18 at 19:12

asked Nov 13 '18 at 13:04

Bentech

447

asked Nov 13 '18 at 13:04

Bentech

447

asked Nov 13 '18 at 13:04

Bentech

447

I need to call the SQLTransformer in a Scala pipeline. I can save the SQLTransformer within Python, load and run it in the Scala side, but despite the fact that I define a _transform method in the class, the default statement is executed in the Scala side.

– Bentech
Nov 13 '18 at 18:29

add a comment |

I need to call the SQLTransformer in a Scala pipeline. I can save the SQLTransformer within Python, load and run it in the Scala side, but despite the fact that I define a _transform method in the class, the default statement is executed in the Scala side.

– Bentech
Nov 13 '18 at 18:29

I need to call the SQLTransformer in a Scala pipeline. I can save the SQLTransformer within Python, load and run it in the Scala side, but despite the fact that I define a _transform method in the class, the default statement is executed in the Scala side.

– Bentech
Nov 13 '18 at 18:29

add a comment |

1 Answer
1

active

oldest

votes

Such information flow is not supported. To create a Tranformer that can be used with both Python and Scala code base you have:

Implement Java or Scala Transformer, in your case extending org.apache.spark.ml.feature.SQLTransformer.

Add Python wrapper extending pyspark.sql.ml.wrapper.JavaTransformer the same way as pyspark.sql.ml.feature.SQLTransformer and interface JVM counterpart from it.

answered Nov 13 '18 at 21:43

user10648740

1

Thanks, that's to say that custom Transformer written in Python can not be used in a Scala pipeline. Because, If I need to write the same code in Scala and in Python, better use directly what is already written in Scala in my Scala pipeline.

– Bentech
Nov 14 '18 at 15:36

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53281646%2fwhy-pyspark-execute-only-the-default-statement-in-my-custom-sqltransformer%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

Such information flow is not supported. To create a Tranformer that can be used with both Python and Scala code base you have:

Implement Java or Scala Transformer, in your case extending org.apache.spark.ml.feature.SQLTransformer.

Add Python wrapper extending pyspark.sql.ml.wrapper.JavaTransformer the same way as pyspark.sql.ml.feature.SQLTransformer and interface JVM counterpart from it.

answered Nov 13 '18 at 21:43

user10648740

1

Thanks, that's to say that custom Transformer written in Python can not be used in a Scala pipeline. Because, If I need to write the same code in Scala and in Python, better use directly what is already written in Scala in my Scala pipeline.

– Bentech
Nov 14 '18 at 15:36

add a comment |

Such information flow is not supported. To create a Tranformer that can be used with both Python and Scala code base you have:

Implement Java or Scala Transformer, in your case extending org.apache.spark.ml.feature.SQLTransformer.

Add Python wrapper extending pyspark.sql.ml.wrapper.JavaTransformer the same way as pyspark.sql.ml.feature.SQLTransformer and interface JVM counterpart from it.

answered Nov 13 '18 at 21:43

user10648740

1

Thanks, that's to say that custom Transformer written in Python can not be used in a Scala pipeline. Because, If I need to write the same code in Scala and in Python, better use directly what is already written in Scala in my Scala pipeline.

– Bentech
Nov 14 '18 at 15:36

add a comment |

Such information flow is not supported. To create a Tranformer that can be used with both Python and Scala code base you have:

Implement Java or Scala Transformer, in your case extending org.apache.spark.ml.feature.SQLTransformer.

Add Python wrapper extending pyspark.sql.ml.wrapper.JavaTransformer the same way as pyspark.sql.ml.feature.SQLTransformer and interface JVM counterpart from it.

answered Nov 13 '18 at 21:43

user10648740

Such information flow is not supported. To create a Tranformer that can be used with both Python and Scala code base you have:

Implement Java or Scala Transformer, in your case extending org.apache.spark.ml.feature.SQLTransformer.

Add Python wrapper extending pyspark.sql.ml.wrapper.JavaTransformer the same way as pyspark.sql.ml.feature.SQLTransformer and interface JVM counterpart from it.

answered Nov 13 '18 at 21:43

user10648740

answered Nov 13 '18 at 21:43

user10648740

answered Nov 13 '18 at 21:43

user10648740

answered Nov 13 '18 at 21:43

user10648740

1

Thanks, that's to say that custom Transformer written in Python can not be used in a Scala pipeline. Because, If I need to write the same code in Scala and in Python, better use directly what is already written in Scala in my Scala pipeline.

– Bentech
Nov 14 '18 at 15:36

add a comment |

1

Thanks, that's to say that custom Transformer written in Python can not be used in a Scala pipeline. Because, If I need to write the same code in Scala and in Python, better use directly what is already written in Scala in my Scala pipeline.

– Bentech
Nov 14 '18 at 15:36

Thanks, that's to say that custom Transformer written in Python can not be used in a Scala pipeline. Because, If I need to write the same code in Scala and in Python, better use directly what is already written in Scala in my Scala pipeline.

– Bentech
Nov 14 '18 at 15:36

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Ndtyjky