How does spark handle aggregate max for non numeric values? [duplicate]
This question already has an answer here:
how to get max(date) from given set of data grouped by some fields using pyspark?
1 answer
I have a dataframe which has the following data
DF1
|value|condition|
+-----+---------+
| 1 | Y |
| 2 | Y |
| 3 | Y |
| 3 | N |
| 3 | N |
+---------------+
I want to understand what will the result of the dataframe if i have max on an aggregation
DF1.groupby(DF1).max(condition) does it give the max count of the strings which is Y, if so how do i get the max values according to the alphabetical order ?
Edit--
This is not for date or any other datatype i want it exclusively for string
scala apache-spark apache-spark-sql
marked as duplicate by eliasah
StackExchange.ready(function() {
if (StackExchange.options.isMobile) return;
$('.dupe-hammer-message-hover:not(.hover-bound)').each(function() {
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');
$hover.hover(
function() {
$hover.showInfoMessage('', {
messageElement: $msg.clone().show(),
transient: false,
position: { my: 'bottom left', at: 'top center', offsetTop: -7 },
dismissable: false,
relativeToBody: true
});
},
function() {
StackExchange.helpers.removeMessages();
}
);
});
});
Nov 15 '18 at 14:06
This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.
add a comment |
This question already has an answer here:
how to get max(date) from given set of data grouped by some fields using pyspark?
1 answer
I have a dataframe which has the following data
DF1
|value|condition|
+-----+---------+
| 1 | Y |
| 2 | Y |
| 3 | Y |
| 3 | N |
| 3 | N |
+---------------+
I want to understand what will the result of the dataframe if i have max on an aggregation
DF1.groupby(DF1).max(condition) does it give the max count of the strings which is Y, if so how do i get the max values according to the alphabetical order ?
Edit--
This is not for date or any other datatype i want it exclusively for string
scala apache-spark apache-spark-sql
marked as duplicate by eliasah
StackExchange.ready(function() {
if (StackExchange.options.isMobile) return;
$('.dupe-hammer-message-hover:not(.hover-bound)').each(function() {
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');
$hover.hover(
function() {
$hover.showInfoMessage('', {
messageElement: $msg.clone().show(),
transient: false,
position: { my: 'bottom left', at: 'top center', offsetTop: -7 },
dismissable: false,
relativeToBody: true
});
},
function() {
StackExchange.helpers.removeMessages();
}
);
});
});
Nov 15 '18 at 14:06
This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.
i want it exclusively for string where the link provided is for date @user10465355
– Sundeep Pidugu
Nov 15 '18 at 13:53
add a comment |
This question already has an answer here:
how to get max(date) from given set of data grouped by some fields using pyspark?
1 answer
I have a dataframe which has the following data
DF1
|value|condition|
+-----+---------+
| 1 | Y |
| 2 | Y |
| 3 | Y |
| 3 | N |
| 3 | N |
+---------------+
I want to understand what will the result of the dataframe if i have max on an aggregation
DF1.groupby(DF1).max(condition) does it give the max count of the strings which is Y, if so how do i get the max values according to the alphabetical order ?
Edit--
This is not for date or any other datatype i want it exclusively for string
scala apache-spark apache-spark-sql
This question already has an answer here:
how to get max(date) from given set of data grouped by some fields using pyspark?
1 answer
I have a dataframe which has the following data
DF1
|value|condition|
+-----+---------+
| 1 | Y |
| 2 | Y |
| 3 | Y |
| 3 | N |
| 3 | N |
+---------------+
I want to understand what will the result of the dataframe if i have max on an aggregation
DF1.groupby(DF1).max(condition) does it give the max count of the strings which is Y, if so how do i get the max values according to the alphabetical order ?
Edit--
This is not for date or any other datatype i want it exclusively for string
This question already has an answer here:
how to get max(date) from given set of data grouped by some fields using pyspark?
1 answer
scala apache-spark apache-spark-sql
scala apache-spark apache-spark-sql
edited Nov 15 '18 at 13:52
Sundeep Pidugu
asked Nov 15 '18 at 13:10
Sundeep PiduguSundeep Pidugu
40614
40614
marked as duplicate by eliasah
StackExchange.ready(function() {
if (StackExchange.options.isMobile) return;
$('.dupe-hammer-message-hover:not(.hover-bound)').each(function() {
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');
$hover.hover(
function() {
$hover.showInfoMessage('', {
messageElement: $msg.clone().show(),
transient: false,
position: { my: 'bottom left', at: 'top center', offsetTop: -7 },
dismissable: false,
relativeToBody: true
});
},
function() {
StackExchange.helpers.removeMessages();
}
);
});
});
Nov 15 '18 at 14:06
This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.
marked as duplicate by eliasah
StackExchange.ready(function() {
if (StackExchange.options.isMobile) return;
$('.dupe-hammer-message-hover:not(.hover-bound)').each(function() {
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');
$hover.hover(
function() {
$hover.showInfoMessage('', {
messageElement: $msg.clone().show(),
transient: false,
position: { my: 'bottom left', at: 'top center', offsetTop: -7 },
dismissable: false,
relativeToBody: true
});
},
function() {
StackExchange.helpers.removeMessages();
}
);
});
});
Nov 15 '18 at 14:06
This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.
i want it exclusively for string where the link provided is for date @user10465355
– Sundeep Pidugu
Nov 15 '18 at 13:53
add a comment |
i want it exclusively for string where the link provided is for date @user10465355
– Sundeep Pidugu
Nov 15 '18 at 13:53
i want it exclusively for string where the link provided is for date @user10465355
– Sundeep Pidugu
Nov 15 '18 at 13:53
i want it exclusively for string where the link provided is for date @user10465355
– Sundeep Pidugu
Nov 15 '18 at 13:53
add a comment |
1 Answer
1
active
oldest
votes
Try this,
scala> val df1 = Seq((1,"Y"),(2,"Y"),(3,"N"),(3,"Z")).toDF("value","condition")
df1: org.apache.spark.sql.DataFrame = [value: int, condition: string]
scala> df1.show
+-----+---------+
|value|condition|
+-----+---------+
| 1| Y|
| 2| Y|
| 3| N|
| 3| Z|
+-----+---------+
scala> df1.agg(max("condition")).show
+--------------+
|max(condition)|
+--------------+
| Z|
+--------------+
so when applying for max it will automatically give the highest order alphabet ? instead of occurrences of the alphabet ?
– Sundeep Pidugu
Nov 15 '18 at 13:55
yes! Is this not what you wanted?
– Sathiyan S
Nov 16 '18 at 6:18
Yeah ! what if want to get the number of occurrences of the alphabet to be calculated ?
– Sundeep Pidugu
Nov 16 '18 at 8:30
1
df1.groupBy("condition").agg(count("condition")).show
– Sathiyan S
Nov 16 '18 at 8:48
add a comment |
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
Try this,
scala> val df1 = Seq((1,"Y"),(2,"Y"),(3,"N"),(3,"Z")).toDF("value","condition")
df1: org.apache.spark.sql.DataFrame = [value: int, condition: string]
scala> df1.show
+-----+---------+
|value|condition|
+-----+---------+
| 1| Y|
| 2| Y|
| 3| N|
| 3| Z|
+-----+---------+
scala> df1.agg(max("condition")).show
+--------------+
|max(condition)|
+--------------+
| Z|
+--------------+
so when applying for max it will automatically give the highest order alphabet ? instead of occurrences of the alphabet ?
– Sundeep Pidugu
Nov 15 '18 at 13:55
yes! Is this not what you wanted?
– Sathiyan S
Nov 16 '18 at 6:18
Yeah ! what if want to get the number of occurrences of the alphabet to be calculated ?
– Sundeep Pidugu
Nov 16 '18 at 8:30
1
df1.groupBy("condition").agg(count("condition")).show
– Sathiyan S
Nov 16 '18 at 8:48
add a comment |
Try this,
scala> val df1 = Seq((1,"Y"),(2,"Y"),(3,"N"),(3,"Z")).toDF("value","condition")
df1: org.apache.spark.sql.DataFrame = [value: int, condition: string]
scala> df1.show
+-----+---------+
|value|condition|
+-----+---------+
| 1| Y|
| 2| Y|
| 3| N|
| 3| Z|
+-----+---------+
scala> df1.agg(max("condition")).show
+--------------+
|max(condition)|
+--------------+
| Z|
+--------------+
so when applying for max it will automatically give the highest order alphabet ? instead of occurrences of the alphabet ?
– Sundeep Pidugu
Nov 15 '18 at 13:55
yes! Is this not what you wanted?
– Sathiyan S
Nov 16 '18 at 6:18
Yeah ! what if want to get the number of occurrences of the alphabet to be calculated ?
– Sundeep Pidugu
Nov 16 '18 at 8:30
1
df1.groupBy("condition").agg(count("condition")).show
– Sathiyan S
Nov 16 '18 at 8:48
add a comment |
Try this,
scala> val df1 = Seq((1,"Y"),(2,"Y"),(3,"N"),(3,"Z")).toDF("value","condition")
df1: org.apache.spark.sql.DataFrame = [value: int, condition: string]
scala> df1.show
+-----+---------+
|value|condition|
+-----+---------+
| 1| Y|
| 2| Y|
| 3| N|
| 3| Z|
+-----+---------+
scala> df1.agg(max("condition")).show
+--------------+
|max(condition)|
+--------------+
| Z|
+--------------+
Try this,
scala> val df1 = Seq((1,"Y"),(2,"Y"),(3,"N"),(3,"Z")).toDF("value","condition")
df1: org.apache.spark.sql.DataFrame = [value: int, condition: string]
scala> df1.show
+-----+---------+
|value|condition|
+-----+---------+
| 1| Y|
| 2| Y|
| 3| N|
| 3| Z|
+-----+---------+
scala> df1.agg(max("condition")).show
+--------------+
|max(condition)|
+--------------+
| Z|
+--------------+
edited Nov 15 '18 at 15:13
Sundeep Pidugu
40614
40614
answered Nov 15 '18 at 13:32
Sathiyan SSathiyan S
513310
513310
so when applying for max it will automatically give the highest order alphabet ? instead of occurrences of the alphabet ?
– Sundeep Pidugu
Nov 15 '18 at 13:55
yes! Is this not what you wanted?
– Sathiyan S
Nov 16 '18 at 6:18
Yeah ! what if want to get the number of occurrences of the alphabet to be calculated ?
– Sundeep Pidugu
Nov 16 '18 at 8:30
1
df1.groupBy("condition").agg(count("condition")).show
– Sathiyan S
Nov 16 '18 at 8:48
add a comment |
so when applying for max it will automatically give the highest order alphabet ? instead of occurrences of the alphabet ?
– Sundeep Pidugu
Nov 15 '18 at 13:55
yes! Is this not what you wanted?
– Sathiyan S
Nov 16 '18 at 6:18
Yeah ! what if want to get the number of occurrences of the alphabet to be calculated ?
– Sundeep Pidugu
Nov 16 '18 at 8:30
1
df1.groupBy("condition").agg(count("condition")).show
– Sathiyan S
Nov 16 '18 at 8:48
so when applying for max it will automatically give the highest order alphabet ? instead of occurrences of the alphabet ?
– Sundeep Pidugu
Nov 15 '18 at 13:55
so when applying for max it will automatically give the highest order alphabet ? instead of occurrences of the alphabet ?
– Sundeep Pidugu
Nov 15 '18 at 13:55
yes! Is this not what you wanted?
– Sathiyan S
Nov 16 '18 at 6:18
yes! Is this not what you wanted?
– Sathiyan S
Nov 16 '18 at 6:18
Yeah ! what if want to get the number of occurrences of the alphabet to be calculated ?
– Sundeep Pidugu
Nov 16 '18 at 8:30
Yeah ! what if want to get the number of occurrences of the alphabet to be calculated ?
– Sundeep Pidugu
Nov 16 '18 at 8:30
1
1
df1.groupBy("condition").agg(count("condition")).show
– Sathiyan S
Nov 16 '18 at 8:48
df1.groupBy("condition").agg(count("condition")).show
– Sathiyan S
Nov 16 '18 at 8:48
add a comment |
i want it exclusively for string where the link provided is for date @user10465355
– Sundeep Pidugu
Nov 15 '18 at 13:53