Print all categories in pyspark dataframe column
I have a large dataframe where one column, called location
, has just a small number of cities, for example: ["New York", "London", "Paris", "Berlin"...]
.
I want to print all distinct values on that column, such that I know if, for example, values for one city are missing. How can I do this, since the .describe('location')
method is not helping ?
python pyspark pyspark-sql
add a comment |
I have a large dataframe where one column, called location
, has just a small number of cities, for example: ["New York", "London", "Paris", "Berlin"...]
.
I want to print all distinct values on that column, such that I know if, for example, values for one city are missing. How can I do this, since the .describe('location')
method is not helping ?
python pyspark pyspark-sql
add a comment |
I have a large dataframe where one column, called location
, has just a small number of cities, for example: ["New York", "London", "Paris", "Berlin"...]
.
I want to print all distinct values on that column, such that I know if, for example, values for one city are missing. How can I do this, since the .describe('location')
method is not helping ?
python pyspark pyspark-sql
I have a large dataframe where one column, called location
, has just a small number of cities, for example: ["New York", "London", "Paris", "Berlin"...]
.
I want to print all distinct values on that column, such that I know if, for example, values for one city are missing. How can I do this, since the .describe('location')
method is not helping ?
python pyspark pyspark-sql
python pyspark pyspark-sql
asked Nov 14 '18 at 10:37
QubixQubix
77221327
77221327
add a comment |
add a comment |
3 Answers
3
active
oldest
votes
With this you cant print the distinct values in the column location
from pyspark.sql import functions as F
df.select(F.col('location')).distinct()
Sorry, by Distinct I meant just a list with all possible values, so not [London, Berlin, Berlin, Berlin, Paris] , but just [London, Berlin, Paris]. I think mine does the same.
– Qubix
Nov 14 '18 at 14:14
1
Yours does also, but you are agregating data and performing an operation you didnt really need. With my code you get the result you want more efficiently :)
– Manrique
Nov 14 '18 at 14:18
did it help ? @Qubix
– Manrique
Nov 15 '18 at 21:56
add a comment |
describe method is for basic predefined statistics like count, mean, std, min, max etc. However, in order to find distinct values for any column you can use distinct() method.
Hope this helps.
Regards,
Neeraj
add a comment |
I found it:
df.groupBy("location").count().show()
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53298203%2fprint-all-categories-in-pyspark-dataframe-column%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
With this you cant print the distinct values in the column location
from pyspark.sql import functions as F
df.select(F.col('location')).distinct()
Sorry, by Distinct I meant just a list with all possible values, so not [London, Berlin, Berlin, Berlin, Paris] , but just [London, Berlin, Paris]. I think mine does the same.
– Qubix
Nov 14 '18 at 14:14
1
Yours does also, but you are agregating data and performing an operation you didnt really need. With my code you get the result you want more efficiently :)
– Manrique
Nov 14 '18 at 14:18
did it help ? @Qubix
– Manrique
Nov 15 '18 at 21:56
add a comment |
With this you cant print the distinct values in the column location
from pyspark.sql import functions as F
df.select(F.col('location')).distinct()
Sorry, by Distinct I meant just a list with all possible values, so not [London, Berlin, Berlin, Berlin, Paris] , but just [London, Berlin, Paris]. I think mine does the same.
– Qubix
Nov 14 '18 at 14:14
1
Yours does also, but you are agregating data and performing an operation you didnt really need. With my code you get the result you want more efficiently :)
– Manrique
Nov 14 '18 at 14:18
did it help ? @Qubix
– Manrique
Nov 15 '18 at 21:56
add a comment |
With this you cant print the distinct values in the column location
from pyspark.sql import functions as F
df.select(F.col('location')).distinct()
With this you cant print the distinct values in the column location
from pyspark.sql import functions as F
df.select(F.col('location')).distinct()
answered Nov 14 '18 at 14:12
ManriqueManrique
500114
500114
Sorry, by Distinct I meant just a list with all possible values, so not [London, Berlin, Berlin, Berlin, Paris] , but just [London, Berlin, Paris]. I think mine does the same.
– Qubix
Nov 14 '18 at 14:14
1
Yours does also, but you are agregating data and performing an operation you didnt really need. With my code you get the result you want more efficiently :)
– Manrique
Nov 14 '18 at 14:18
did it help ? @Qubix
– Manrique
Nov 15 '18 at 21:56
add a comment |
Sorry, by Distinct I meant just a list with all possible values, so not [London, Berlin, Berlin, Berlin, Paris] , but just [London, Berlin, Paris]. I think mine does the same.
– Qubix
Nov 14 '18 at 14:14
1
Yours does also, but you are agregating data and performing an operation you didnt really need. With my code you get the result you want more efficiently :)
– Manrique
Nov 14 '18 at 14:18
did it help ? @Qubix
– Manrique
Nov 15 '18 at 21:56
Sorry, by Distinct I meant just a list with all possible values, so not [London, Berlin, Berlin, Berlin, Paris] , but just [London, Berlin, Paris]. I think mine does the same.
– Qubix
Nov 14 '18 at 14:14
Sorry, by Distinct I meant just a list with all possible values, so not [London, Berlin, Berlin, Berlin, Paris] , but just [London, Berlin, Paris]. I think mine does the same.
– Qubix
Nov 14 '18 at 14:14
1
1
Yours does also, but you are agregating data and performing an operation you didnt really need. With my code you get the result you want more efficiently :)
– Manrique
Nov 14 '18 at 14:18
Yours does also, but you are agregating data and performing an operation you didnt really need. With my code you get the result you want more efficiently :)
– Manrique
Nov 14 '18 at 14:18
did it help ? @Qubix
– Manrique
Nov 15 '18 at 21:56
did it help ? @Qubix
– Manrique
Nov 15 '18 at 21:56
add a comment |
describe method is for basic predefined statistics like count, mean, std, min, max etc. However, in order to find distinct values for any column you can use distinct() method.
Hope this helps.
Regards,
Neeraj
add a comment |
describe method is for basic predefined statistics like count, mean, std, min, max etc. However, in order to find distinct values for any column you can use distinct() method.
Hope this helps.
Regards,
Neeraj
add a comment |
describe method is for basic predefined statistics like count, mean, std, min, max etc. However, in order to find distinct values for any column you can use distinct() method.
Hope this helps.
Regards,
Neeraj
describe method is for basic predefined statistics like count, mean, std, min, max etc. However, in order to find distinct values for any column you can use distinct() method.
Hope this helps.
Regards,
Neeraj
answered Nov 19 '18 at 14:10
neeraj bhadanineeraj bhadani
837312
837312
add a comment |
add a comment |
I found it:
df.groupBy("location").count().show()
add a comment |
I found it:
df.groupBy("location").count().show()
add a comment |
I found it:
df.groupBy("location").count().show()
I found it:
df.groupBy("location").count().show()
answered Nov 14 '18 at 10:43
QubixQubix
77221327
77221327
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53298203%2fprint-all-categories-in-pyspark-dataframe-column%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown