How does spark handle aggregate max for non numeric values? [duplicate]

This question already has an answer here:

how to get max(date) from given set of data grouped by some fields using pyspark?

1 answer

I have a dataframe which has the following data

DF1

|value|condition|

+-----+---------+

| 1   |   Y     |

| 2   |   Y     |

| 3   |   Y     |

| 3   |   N     |

| 3   |   N     |

+---------------+

I want to understand what will the result of the dataframe if i have max on an aggregation

DF1.groupby(DF1).max(condition) does it give the max count of the strings which is Y, if so how do i get the max values according to the alphabetical order ?

Edit--

This is not for date or any other datatype i want it exclusively for string

edited Nov 15 '18 at 13:52

asked Nov 15 '18 at 13:10

Sundeep Pidugu

40614

marked as duplicate by eliasah apache-spark
Users with the apache-spark badge can single-handedly close apache-spark questions as duplicates and reopen them as needed.

StackExchange.ready(function() {
if (StackExchange.options.isMobile) return;

$('.dupe-hammer-message-hover:not(.hover-bound)').each(function() {
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');

$hover.hover(
function() {
$hover.showInfoMessage('', {
messageElement: $msg.clone().show(),
transient: false,
position: { my: 'bottom left', at: 'top center', offsetTop: -7 },
dismissable: false,
relativeToBody: true
});
},
function() {
StackExchange.helpers.removeMessages();
}
);
});
});
Nov 15 '18 at 14:06

This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.

i want it exclusively for string where the link provided is for date @user10465355

– Sundeep Pidugu
Nov 15 '18 at 13:53

add a comment |

This question already has an answer here:

how to get max(date) from given set of data grouped by some fields using pyspark?

1 answer

I have a dataframe which has the following data

DF1

|value|condition|

+-----+---------+

| 1   |   Y     |

| 2   |   Y     |

| 3   |   Y     |

| 3   |   N     |

| 3   |   N     |

+---------------+

I want to understand what will the result of the dataframe if i have max on an aggregation

DF1.groupby(DF1).max(condition) does it give the max count of the strings which is Y, if so how do i get the max values according to the alphabetical order ?

Edit--

This is not for date or any other datatype i want it exclusively for string

edited Nov 15 '18 at 13:52

asked Nov 15 '18 at 13:10

Sundeep Pidugu

40614

marked as duplicate by eliasah apache-spark
Users with the apache-spark badge can single-handedly close apache-spark questions as duplicates and reopen them as needed.

StackExchange.ready(function() {
if (StackExchange.options.isMobile) return;

$('.dupe-hammer-message-hover:not(.hover-bound)').each(function() {
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');

$hover.hover(
function() {
$hover.showInfoMessage('', {
messageElement: $msg.clone().show(),
transient: false,
position: { my: 'bottom left', at: 'top center', offsetTop: -7 },
dismissable: false,
relativeToBody: true
});
},
function() {
StackExchange.helpers.removeMessages();
}
);
});
});
Nov 15 '18 at 14:06

This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.

i want it exclusively for string where the link provided is for date @user10465355

– Sundeep Pidugu
Nov 15 '18 at 13:53

add a comment |

This question already has an answer here:

how to get max(date) from given set of data grouped by some fields using pyspark?

1 answer

I have a dataframe which has the following data

DF1

|value|condition|

+-----+---------+

| 1   |   Y     |

| 2   |   Y     |

| 3   |   Y     |

| 3   |   N     |

| 3   |   N     |

+---------------+

I want to understand what will the result of the dataframe if i have max on an aggregation

DF1.groupby(DF1).max(condition) does it give the max count of the strings which is Y, if so how do i get the max values according to the alphabetical order ?

Edit--

This is not for date or any other datatype i want it exclusively for string

edited Nov 15 '18 at 13:52

asked Nov 15 '18 at 13:10

Sundeep Pidugu

40614

This question already has an answer here:

how to get max(date) from given set of data grouped by some fields using pyspark?

1 answer

I have a dataframe which has the following data

DF1

|value|condition|

+-----+---------+

| 1   |   Y     |

| 2   |   Y     |

| 3   |   Y     |

| 3   |   N     |

| 3   |   N     |

+---------------+

I want to understand what will the result of the dataframe if i have max on an aggregation

DF1.groupby(DF1).max(condition) does it give the max count of the strings which is Y, if so how do i get the max values according to the alphabetical order ?

Edit--

This is not for date or any other datatype i want it exclusively for string

This question already has an answer here:

how to get max(date) from given set of data grouped by some fields using pyspark?

1 answer

scala apache-spark apache-spark-sql

edited Nov 15 '18 at 13:52

asked Nov 15 '18 at 13:10

Sundeep Pidugu

40614

edited Nov 15 '18 at 13:52

asked Nov 15 '18 at 13:10

Sundeep Pidugu

40614

edited Nov 15 '18 at 13:52

asked Nov 15 '18 at 13:10

Sundeep Pidugu

40614

asked Nov 15 '18 at 13:10

Sundeep Pidugu

40614

asked Nov 15 '18 at 13:10

Sundeep Pidugu

40614

marked as duplicate by eliasah apache-spark
Users with the apache-spark badge can single-handedly close apache-spark questions as duplicates and reopen them as needed.

StackExchange.ready(function() {
if (StackExchange.options.isMobile) return;

$('.dupe-hammer-message-hover:not(.hover-bound)').each(function() {
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');

$hover.hover(
function() {
$hover.showInfoMessage('', {
messageElement: $msg.clone().show(),
transient: false,
position: { my: 'bottom left', at: 'top center', offsetTop: -7 },
dismissable: false,
relativeToBody: true
});
},
function() {
StackExchange.helpers.removeMessages();
}
);
});
});
Nov 15 '18 at 14:06

This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.

marked as duplicate by eliasah apache-spark
Users with the apache-spark badge can single-handedly close apache-spark questions as duplicates and reopen them as needed.

StackExchange.ready(function() {
if (StackExchange.options.isMobile) return;

$('.dupe-hammer-message-hover:not(.hover-bound)').each(function() {
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');

$hover.hover(
function() {
$hover.showInfoMessage('', {
messageElement: $msg.clone().show(),
transient: false,
position: { my: 'bottom left', at: 'top center', offsetTop: -7 },
dismissable: false,
relativeToBody: true
});
},
function() {
StackExchange.helpers.removeMessages();
}
);
});
});
Nov 15 '18 at 14:06

This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.

i want it exclusively for string where the link provided is for date @user10465355

– Sundeep Pidugu
Nov 15 '18 at 13:53

add a comment |

i want it exclusively for string where the link provided is for date @user10465355

– Sundeep Pidugu
Nov 15 '18 at 13:53

i want it exclusively for string where the link provided is for date @user10465355

– Sundeep Pidugu
Nov 15 '18 at 13:53

add a comment |

1 Answer
1

active

oldest

votes

Try this,

scala> val df1 = Seq((1,"Y"),(2,"Y"),(3,"N"),(3,"Z")).toDF("value","condition")

df1: org.apache.spark.sql.DataFrame = [value: int, condition: string]



scala> df1.show

+-----+---------+

|value|condition|

+-----+---------+

|    1|        Y|

|    2|        Y|

|    3|        N|

|    3|        Z|

+-----+---------+





scala> df1.agg(max("condition")).show

+--------------+

|max(condition)|

+--------------+

|             Z|

+--------------+

edited Nov 15 '18 at 15:13

Sundeep Pidugu

40614

answered Nov 15 '18 at 13:32

Sathiyan S

513310

so when applying for max it will automatically give the highest order alphabet ? instead of occurrences of the alphabet ?

– Sundeep Pidugu
Nov 15 '18 at 13:55

yes! Is this not what you wanted?

– Sathiyan S
Nov 16 '18 at 6:18

Yeah ! what if want to get the number of occurrences of the alphabet to be calculated ?

– Sundeep Pidugu
Nov 16 '18 at 8:30

1

df1.groupBy("condition").agg(count("condition")).show

– Sathiyan S
Nov 16 '18 at 8:48

add a comment |

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

Try this,

scala> val df1 = Seq((1,"Y"),(2,"Y"),(3,"N"),(3,"Z")).toDF("value","condition")

df1: org.apache.spark.sql.DataFrame = [value: int, condition: string]



scala> df1.show

+-----+---------+

|value|condition|

+-----+---------+

|    1|        Y|

|    2|        Y|

|    3|        N|

|    3|        Z|

+-----+---------+





scala> df1.agg(max("condition")).show

+--------------+

|max(condition)|

+--------------+

|             Z|

+--------------+

edited Nov 15 '18 at 15:13

Sundeep Pidugu

40614

answered Nov 15 '18 at 13:32

Sathiyan S

513310

so when applying for max it will automatically give the highest order alphabet ? instead of occurrences of the alphabet ?

– Sundeep Pidugu
Nov 15 '18 at 13:55

yes! Is this not what you wanted?

– Sathiyan S
Nov 16 '18 at 6:18

Yeah ! what if want to get the number of occurrences of the alphabet to be calculated ?

– Sundeep Pidugu
Nov 16 '18 at 8:30

1

df1.groupBy("condition").agg(count("condition")).show

– Sathiyan S
Nov 16 '18 at 8:48

add a comment |

Try this,

scala> val df1 = Seq((1,"Y"),(2,"Y"),(3,"N"),(3,"Z")).toDF("value","condition")

df1: org.apache.spark.sql.DataFrame = [value: int, condition: string]



scala> df1.show

+-----+---------+

|value|condition|

+-----+---------+

|    1|        Y|

|    2|        Y|

|    3|        N|

|    3|        Z|

+-----+---------+





scala> df1.agg(max("condition")).show

+--------------+

|max(condition)|

+--------------+

|             Z|

+--------------+

edited Nov 15 '18 at 15:13

Sundeep Pidugu

40614

answered Nov 15 '18 at 13:32

Sathiyan S

513310

so when applying for max it will automatically give the highest order alphabet ? instead of occurrences of the alphabet ?

– Sundeep Pidugu
Nov 15 '18 at 13:55

yes! Is this not what you wanted?

– Sathiyan S
Nov 16 '18 at 6:18

Yeah ! what if want to get the number of occurrences of the alphabet to be calculated ?

– Sundeep Pidugu
Nov 16 '18 at 8:30

1

df1.groupBy("condition").agg(count("condition")).show

– Sathiyan S
Nov 16 '18 at 8:48

add a comment |

Try this,

scala> val df1 = Seq((1,"Y"),(2,"Y"),(3,"N"),(3,"Z")).toDF("value","condition")

df1: org.apache.spark.sql.DataFrame = [value: int, condition: string]



scala> df1.show

+-----+---------+

|value|condition|

+-----+---------+

|    1|        Y|

|    2|        Y|

|    3|        N|

|    3|        Z|

+-----+---------+





scala> df1.agg(max("condition")).show

+--------------+

|max(condition)|

+--------------+

|             Z|

+--------------+

edited Nov 15 '18 at 15:13

Sundeep Pidugu

40614

answered Nov 15 '18 at 13:32

Sathiyan S

513310

Try this,

scala> val df1 = Seq((1,"Y"),(2,"Y"),(3,"N"),(3,"Z")).toDF("value","condition")

df1: org.apache.spark.sql.DataFrame = [value: int, condition: string]



scala> df1.show

+-----+---------+

|value|condition|

+-----+---------+

|    1|        Y|

|    2|        Y|

|    3|        N|

|    3|        Z|

+-----+---------+





scala> df1.agg(max("condition")).show

+--------------+

|max(condition)|

+--------------+

|             Z|

+--------------+

edited Nov 15 '18 at 15:13

Sundeep Pidugu

40614

answered Nov 15 '18 at 13:32

Sathiyan S

513310

edited Nov 15 '18 at 15:13

Sundeep Pidugu

40614

edited Nov 15 '18 at 15:13

Sundeep Pidugu

40614

edited Nov 15 '18 at 15:13

Sundeep Pidugu

40614

answered Nov 15 '18 at 13:32

Sathiyan S

513310

answered Nov 15 '18 at 13:32

Sathiyan S

513310

answered Nov 15 '18 at 13:32

Sathiyan S

513310

so when applying for max it will automatically give the highest order alphabet ? instead of occurrences of the alphabet ?

– Sundeep Pidugu
Nov 15 '18 at 13:55

yes! Is this not what you wanted?

– Sathiyan S
Nov 16 '18 at 6:18

Yeah ! what if want to get the number of occurrences of the alphabet to be calculated ?

– Sundeep Pidugu
Nov 16 '18 at 8:30

1

df1.groupBy("condition").agg(count("condition")).show

– Sathiyan S
Nov 16 '18 at 8:48

add a comment |

so when applying for max it will automatically give the highest order alphabet ? instead of occurrences of the alphabet ?

– Sundeep Pidugu
Nov 15 '18 at 13:55

yes! Is this not what you wanted?

– Sathiyan S
Nov 16 '18 at 6:18

Yeah ! what if want to get the number of occurrences of the alphabet to be calculated ?

– Sundeep Pidugu
Nov 16 '18 at 8:30

1

df1.groupBy("condition").agg(count("condition")).show

– Sathiyan S
Nov 16 '18 at 8:48

so when applying for max it will automatically give the highest order alphabet ? instead of occurrences of the alphabet ?

– Sundeep Pidugu
Nov 15 '18 at 13:55

yes! Is this not what you wanted?

– Sathiyan S
Nov 16 '18 at 6:18

Yeah ! what if want to get the number of occurrences of the alphabet to be calculated ?

– Sundeep Pidugu
Nov 16 '18 at 8:30

df1.groupBy("condition").agg(count("condition")).show

– Sathiyan S
Nov 16 '18 at 8:48

add a comment |

This page is only for reference, If you need detailed information, please check here

p5HwkNGyQlkiczuK,mh6RYWcgA,MlbBcbN EqN7mevh0Vc6iYEVymXFHQezuWZ F3mW,88 j0 6KBEixtAe,wNq9XOgFWv,PaKY8jylFs21l

搜尋此網誌

Ndtyjky