How does spark handle aggregate max for non numeric values? [duplicate]












0
















This question already has an answer here:




  • how to get max(date) from given set of data grouped by some fields using pyspark?

    1 answer




I have a dataframe which has the following data



DF1



|value|condition|
+-----+---------+
| 1 | Y |
| 2 | Y |
| 3 | Y |
| 3 | N |
| 3 | N |
+---------------+


I want to understand what will the result of the dataframe if i have max on an aggregation



DF1.groupby(DF1).max(condition) does it give the max count of the strings which is Y, if so how do i get the max values according to the alphabetical order ?



Edit--



This is not for date or any other datatype i want it exclusively for string










share|improve this question















marked as duplicate by eliasah apache-spark
Users with the  apache-spark badge can single-handedly close apache-spark questions as duplicates and reopen them as needed.

StackExchange.ready(function() {
if (StackExchange.options.isMobile) return;

$('.dupe-hammer-message-hover:not(.hover-bound)').each(function() {
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');

$hover.hover(
function() {
$hover.showInfoMessage('', {
messageElement: $msg.clone().show(),
transient: false,
position: { my: 'bottom left', at: 'top center', offsetTop: -7 },
dismissable: false,
relativeToBody: true
});
},
function() {
StackExchange.helpers.removeMessages();
}
);
});
});
Nov 15 '18 at 14:06


This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.



















  • i want it exclusively for string where the link provided is for date @user10465355

    – Sundeep Pidugu
    Nov 15 '18 at 13:53
















0
















This question already has an answer here:




  • how to get max(date) from given set of data grouped by some fields using pyspark?

    1 answer




I have a dataframe which has the following data



DF1



|value|condition|
+-----+---------+
| 1 | Y |
| 2 | Y |
| 3 | Y |
| 3 | N |
| 3 | N |
+---------------+


I want to understand what will the result of the dataframe if i have max on an aggregation



DF1.groupby(DF1).max(condition) does it give the max count of the strings which is Y, if so how do i get the max values according to the alphabetical order ?



Edit--



This is not for date or any other datatype i want it exclusively for string










share|improve this question















marked as duplicate by eliasah apache-spark
Users with the  apache-spark badge can single-handedly close apache-spark questions as duplicates and reopen them as needed.

StackExchange.ready(function() {
if (StackExchange.options.isMobile) return;

$('.dupe-hammer-message-hover:not(.hover-bound)').each(function() {
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');

$hover.hover(
function() {
$hover.showInfoMessage('', {
messageElement: $msg.clone().show(),
transient: false,
position: { my: 'bottom left', at: 'top center', offsetTop: -7 },
dismissable: false,
relativeToBody: true
});
},
function() {
StackExchange.helpers.removeMessages();
}
);
});
});
Nov 15 '18 at 14:06


This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.



















  • i want it exclusively for string where the link provided is for date @user10465355

    – Sundeep Pidugu
    Nov 15 '18 at 13:53














0












0








0









This question already has an answer here:




  • how to get max(date) from given set of data grouped by some fields using pyspark?

    1 answer




I have a dataframe which has the following data



DF1



|value|condition|
+-----+---------+
| 1 | Y |
| 2 | Y |
| 3 | Y |
| 3 | N |
| 3 | N |
+---------------+


I want to understand what will the result of the dataframe if i have max on an aggregation



DF1.groupby(DF1).max(condition) does it give the max count of the strings which is Y, if so how do i get the max values according to the alphabetical order ?



Edit--



This is not for date or any other datatype i want it exclusively for string










share|improve this question

















This question already has an answer here:




  • how to get max(date) from given set of data grouped by some fields using pyspark?

    1 answer




I have a dataframe which has the following data



DF1



|value|condition|
+-----+---------+
| 1 | Y |
| 2 | Y |
| 3 | Y |
| 3 | N |
| 3 | N |
+---------------+


I want to understand what will the result of the dataframe if i have max on an aggregation



DF1.groupby(DF1).max(condition) does it give the max count of the strings which is Y, if so how do i get the max values according to the alphabetical order ?



Edit--



This is not for date or any other datatype i want it exclusively for string





This question already has an answer here:




  • how to get max(date) from given set of data grouped by some fields using pyspark?

    1 answer








scala apache-spark apache-spark-sql






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 15 '18 at 13:52







Sundeep Pidugu

















asked Nov 15 '18 at 13:10









Sundeep PiduguSundeep Pidugu

40614




40614




marked as duplicate by eliasah apache-spark
Users with the  apache-spark badge can single-handedly close apache-spark questions as duplicates and reopen them as needed.

StackExchange.ready(function() {
if (StackExchange.options.isMobile) return;

$('.dupe-hammer-message-hover:not(.hover-bound)').each(function() {
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');

$hover.hover(
function() {
$hover.showInfoMessage('', {
messageElement: $msg.clone().show(),
transient: false,
position: { my: 'bottom left', at: 'top center', offsetTop: -7 },
dismissable: false,
relativeToBody: true
});
},
function() {
StackExchange.helpers.removeMessages();
}
);
});
});
Nov 15 '18 at 14:06


This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.









marked as duplicate by eliasah apache-spark
Users with the  apache-spark badge can single-handedly close apache-spark questions as duplicates and reopen them as needed.

StackExchange.ready(function() {
if (StackExchange.options.isMobile) return;

$('.dupe-hammer-message-hover:not(.hover-bound)').each(function() {
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');

$hover.hover(
function() {
$hover.showInfoMessage('', {
messageElement: $msg.clone().show(),
transient: false,
position: { my: 'bottom left', at: 'top center', offsetTop: -7 },
dismissable: false,
relativeToBody: true
});
},
function() {
StackExchange.helpers.removeMessages();
}
);
});
});
Nov 15 '18 at 14:06


This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.















  • i want it exclusively for string where the link provided is for date @user10465355

    – Sundeep Pidugu
    Nov 15 '18 at 13:53



















  • i want it exclusively for string where the link provided is for date @user10465355

    – Sundeep Pidugu
    Nov 15 '18 at 13:53

















i want it exclusively for string where the link provided is for date @user10465355

– Sundeep Pidugu
Nov 15 '18 at 13:53





i want it exclusively for string where the link provided is for date @user10465355

– Sundeep Pidugu
Nov 15 '18 at 13:53












1 Answer
1






active

oldest

votes


















2














Try this,



scala> val df1 = Seq((1,"Y"),(2,"Y"),(3,"N"),(3,"Z")).toDF("value","condition")
df1: org.apache.spark.sql.DataFrame = [value: int, condition: string]

scala> df1.show
+-----+---------+
|value|condition|
+-----+---------+
| 1| Y|
| 2| Y|
| 3| N|
| 3| Z|
+-----+---------+


scala> df1.agg(max("condition")).show
+--------------+
|max(condition)|
+--------------+
| Z|
+--------------+





share|improve this answer


























  • so when applying for max it will automatically give the highest order alphabet ? instead of occurrences of the alphabet ?

    – Sundeep Pidugu
    Nov 15 '18 at 13:55











  • yes! Is this not what you wanted?

    – Sathiyan S
    Nov 16 '18 at 6:18











  • Yeah ! what if want to get the number of occurrences of the alphabet to be calculated ?

    – Sundeep Pidugu
    Nov 16 '18 at 8:30






  • 1





    df1.groupBy("condition").agg(count("condition")).show

    – Sathiyan S
    Nov 16 '18 at 8:48


















1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









2














Try this,



scala> val df1 = Seq((1,"Y"),(2,"Y"),(3,"N"),(3,"Z")).toDF("value","condition")
df1: org.apache.spark.sql.DataFrame = [value: int, condition: string]

scala> df1.show
+-----+---------+
|value|condition|
+-----+---------+
| 1| Y|
| 2| Y|
| 3| N|
| 3| Z|
+-----+---------+


scala> df1.agg(max("condition")).show
+--------------+
|max(condition)|
+--------------+
| Z|
+--------------+





share|improve this answer


























  • so when applying for max it will automatically give the highest order alphabet ? instead of occurrences of the alphabet ?

    – Sundeep Pidugu
    Nov 15 '18 at 13:55











  • yes! Is this not what you wanted?

    – Sathiyan S
    Nov 16 '18 at 6:18











  • Yeah ! what if want to get the number of occurrences of the alphabet to be calculated ?

    – Sundeep Pidugu
    Nov 16 '18 at 8:30






  • 1





    df1.groupBy("condition").agg(count("condition")).show

    – Sathiyan S
    Nov 16 '18 at 8:48
















2














Try this,



scala> val df1 = Seq((1,"Y"),(2,"Y"),(3,"N"),(3,"Z")).toDF("value","condition")
df1: org.apache.spark.sql.DataFrame = [value: int, condition: string]

scala> df1.show
+-----+---------+
|value|condition|
+-----+---------+
| 1| Y|
| 2| Y|
| 3| N|
| 3| Z|
+-----+---------+


scala> df1.agg(max("condition")).show
+--------------+
|max(condition)|
+--------------+
| Z|
+--------------+





share|improve this answer


























  • so when applying for max it will automatically give the highest order alphabet ? instead of occurrences of the alphabet ?

    – Sundeep Pidugu
    Nov 15 '18 at 13:55











  • yes! Is this not what you wanted?

    – Sathiyan S
    Nov 16 '18 at 6:18











  • Yeah ! what if want to get the number of occurrences of the alphabet to be calculated ?

    – Sundeep Pidugu
    Nov 16 '18 at 8:30






  • 1





    df1.groupBy("condition").agg(count("condition")).show

    – Sathiyan S
    Nov 16 '18 at 8:48














2












2








2







Try this,



scala> val df1 = Seq((1,"Y"),(2,"Y"),(3,"N"),(3,"Z")).toDF("value","condition")
df1: org.apache.spark.sql.DataFrame = [value: int, condition: string]

scala> df1.show
+-----+---------+
|value|condition|
+-----+---------+
| 1| Y|
| 2| Y|
| 3| N|
| 3| Z|
+-----+---------+


scala> df1.agg(max("condition")).show
+--------------+
|max(condition)|
+--------------+
| Z|
+--------------+





share|improve this answer















Try this,



scala> val df1 = Seq((1,"Y"),(2,"Y"),(3,"N"),(3,"Z")).toDF("value","condition")
df1: org.apache.spark.sql.DataFrame = [value: int, condition: string]

scala> df1.show
+-----+---------+
|value|condition|
+-----+---------+
| 1| Y|
| 2| Y|
| 3| N|
| 3| Z|
+-----+---------+


scala> df1.agg(max("condition")).show
+--------------+
|max(condition)|
+--------------+
| Z|
+--------------+






share|improve this answer














share|improve this answer



share|improve this answer








edited Nov 15 '18 at 15:13









Sundeep Pidugu

40614




40614










answered Nov 15 '18 at 13:32









Sathiyan SSathiyan S

513310




513310













  • so when applying for max it will automatically give the highest order alphabet ? instead of occurrences of the alphabet ?

    – Sundeep Pidugu
    Nov 15 '18 at 13:55











  • yes! Is this not what you wanted?

    – Sathiyan S
    Nov 16 '18 at 6:18











  • Yeah ! what if want to get the number of occurrences of the alphabet to be calculated ?

    – Sundeep Pidugu
    Nov 16 '18 at 8:30






  • 1





    df1.groupBy("condition").agg(count("condition")).show

    – Sathiyan S
    Nov 16 '18 at 8:48



















  • so when applying for max it will automatically give the highest order alphabet ? instead of occurrences of the alphabet ?

    – Sundeep Pidugu
    Nov 15 '18 at 13:55











  • yes! Is this not what you wanted?

    – Sathiyan S
    Nov 16 '18 at 6:18











  • Yeah ! what if want to get the number of occurrences of the alphabet to be calculated ?

    – Sundeep Pidugu
    Nov 16 '18 at 8:30






  • 1





    df1.groupBy("condition").agg(count("condition")).show

    – Sathiyan S
    Nov 16 '18 at 8:48

















so when applying for max it will automatically give the highest order alphabet ? instead of occurrences of the alphabet ?

– Sundeep Pidugu
Nov 15 '18 at 13:55





so when applying for max it will automatically give the highest order alphabet ? instead of occurrences of the alphabet ?

– Sundeep Pidugu
Nov 15 '18 at 13:55













yes! Is this not what you wanted?

– Sathiyan S
Nov 16 '18 at 6:18





yes! Is this not what you wanted?

– Sathiyan S
Nov 16 '18 at 6:18













Yeah ! what if want to get the number of occurrences of the alphabet to be calculated ?

– Sundeep Pidugu
Nov 16 '18 at 8:30





Yeah ! what if want to get the number of occurrences of the alphabet to be calculated ?

– Sundeep Pidugu
Nov 16 '18 at 8:30




1




1





df1.groupBy("condition").agg(count("condition")).show

– Sathiyan S
Nov 16 '18 at 8:48





df1.groupBy("condition").agg(count("condition")).show

– Sathiyan S
Nov 16 '18 at 8:48





Popular posts from this blog

Florida Star v. B. J. F.

Danny Elfman

Retrieve a Users Dashboard in Tumblr with R and TumblR. Oauth Issues