Applying groupby twice on pandas dataframe

up vote
1
down vote

favorite

I am storing a huge .csv file in a pandas data frame. The structure of the table is something like this

Category       Time      Col1

1              00:00      3

1              01:00      6

1              01:00      10

2              02:00      8

2              02:00      12

2              03:00      6

3              04:00      13

3              05:00      8

I want to find the following for every category

[summation(sum of col1 for each time of each category) * (count of col1 for each time in each category)]/(total number of rows) for each
category.

So basically I'm trying to apply group by once on category and then in every category, I want to apply group by again on Time and
compute as above.

So for the above example, my output should look like

Category       Col1

1         [3 + (2 * (6 + 10))] / 8

2         [(2 * (8 + 12)) + 6] / 8

3         [13 + 8] / 8

edited Nov 11 at 0:46

coldspeed

111k17101170

asked Nov 11 at 0:40

Mojojo

add a comment |

up vote
1
down vote

favorite

I am storing a huge .csv file in a pandas data frame. The structure of the table is something like this

Category       Time      Col1

1              00:00      3

1              01:00      6

1              01:00      10

2              02:00      8

2              02:00      12

2              03:00      6

3              04:00      13

3              05:00      8

I want to find the following for every category

[summation(sum of col1 for each time of each category) * (count of col1 for each time in each category)]/(total number of rows) for each
category.

So basically I'm trying to apply group by once on category and then in every category, I want to apply group by again on Time and
compute as above.

So for the above example, my output should look like

Category       Col1

1         [3 + (2 * (6 + 10))] / 8

2         [(2 * (8 + 12)) + 6] / 8

3         [13 + 8] / 8

edited Nov 11 at 0:46

coldspeed

111k17101170

asked Nov 11 at 0:40

Mojojo

add a comment |

up vote
1
down vote

favorite

I am storing a huge .csv file in a pandas data frame. The structure of the table is something like this

Category       Time      Col1

1              00:00      3

1              01:00      6

1              01:00      10

2              02:00      8

2              02:00      12

2              03:00      6

3              04:00      13

3              05:00      8

I want to find the following for every category

[summation(sum of col1 for each time of each category) * (count of col1 for each time in each category)]/(total number of rows) for each
category.

So basically I'm trying to apply group by once on category and then in every category, I want to apply group by again on Time and
compute as above.

So for the above example, my output should look like

Category       Col1

1         [3 + (2 * (6 + 10))] / 8

2         [(2 * (8 + 12)) + 6] / 8

3         [13 + 8] / 8

edited Nov 11 at 0:46

coldspeed

111k17101170

asked Nov 11 at 0:40

Mojojo

I am storing a huge .csv file in a pandas data frame. The structure of the table is something like this

Category       Time      Col1

1              00:00      3

1              01:00      6

1              01:00      10

2              02:00      8

2              02:00      12

2              03:00      6

3              04:00      13

3              05:00      8

I want to find the following for every category

[summation(sum of col1 for each time of each category) * (count of col1 for each time in each category)]/(total number of rows) for each
category.

So basically I'm trying to apply group by once on category and then in every category, I want to apply group by again on Time and
compute as above.

So for the above example, my output should look like

Category       Col1

1         [3 + (2 * (6 + 10))] / 8

2         [(2 * (8 + 12)) + 6] / 8

3         [13 + 8] / 8

python python-3.x pandas pandas-groupby

edited Nov 11 at 0:46

coldspeed

111k17101170

asked Nov 11 at 0:40

Mojojo

edited Nov 11 at 0:46

coldspeed

111k17101170

asked Nov 11 at 0:40

Mojojo

edited Nov 11 at 0:46

coldspeed

111k17101170

edited Nov 11 at 0:46

coldspeed

111k17101170

edited Nov 11 at 0:46

coldspeed

111k17101170

asked Nov 11 at 0:40

Mojojo

asked Nov 11 at 0:40

Mojojo

asked Nov 11 at 0:40

Mojojo

add a comment |

2 Answers
2

active

oldest

votes

up vote
1
down vote

This can be easily done in 2 groupby steps:

v = df.groupby(['Category', 'Time']).Col1.agg(['count', 'sum'])

v['count'].mul(v['sum']).groupby(level=0).sum() / len(df)







Category

1    4.375

2    5.750

3    2.625

dtype: float64

answered Nov 11 at 0:49

coldspeed

111k17101170

But I need to have only 3 rows, one for each category. This formula must be applied for each category "[summation(sum of col1 for each time of each category) * (count of col1 for each time in each category)]/(total number of rows)" resulting in one value in col1.
– Mojojo
Nov 11 at 0:54

@Mojojo What's wrong with the answer here?
– coldspeed
Nov 11 at 1:00

add a comment |

up vote
0
down vote

Using transform with sum create the count , then we using Seriesgroupby get the result

s1=df.groupby(['ategory','Time']).Col1.transform('count')

(s1*df.Col1).groupby(df['ategory']).sum()/df.groupby('ategory').Col1.sum()

Out[631]: 

ategory

1    1.842105

2    1.769231

3    1.000000

Name: Col1, dtype: float64

answered Nov 11 at 2:15

W-B

94.3k72857

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53244815%2fapplying-groupby-twice-on-pandas-dataframe%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

up vote
1
down vote

This can be easily done in 2 groupby steps:

v = df.groupby(['Category', 'Time']).Col1.agg(['count', 'sum'])

v['count'].mul(v['sum']).groupby(level=0).sum() / len(df)







Category

1    4.375

2    5.750

3    2.625

dtype: float64

answered Nov 11 at 0:49

coldspeed

111k17101170

But I need to have only 3 rows, one for each category. This formula must be applied for each category "[summation(sum of col1 for each time of each category) * (count of col1 for each time in each category)]/(total number of rows)" resulting in one value in col1.
– Mojojo
Nov 11 at 0:54

@Mojojo What's wrong with the answer here?
– coldspeed
Nov 11 at 1:00

add a comment |

up vote
1
down vote

This can be easily done in 2 groupby steps:

v = df.groupby(['Category', 'Time']).Col1.agg(['count', 'sum'])

v['count'].mul(v['sum']).groupby(level=0).sum() / len(df)







Category

1    4.375

2    5.750

3    2.625

dtype: float64

answered Nov 11 at 0:49

coldspeed

111k17101170

But I need to have only 3 rows, one for each category. This formula must be applied for each category "[summation(sum of col1 for each time of each category) * (count of col1 for each time in each category)]/(total number of rows)" resulting in one value in col1.
– Mojojo
Nov 11 at 0:54

@Mojojo What's wrong with the answer here?
– coldspeed
Nov 11 at 1:00

add a comment |

up vote
1
down vote

This can be easily done in 2 groupby steps:

v = df.groupby(['Category', 'Time']).Col1.agg(['count', 'sum'])

v['count'].mul(v['sum']).groupby(level=0).sum() / len(df)







Category

1    4.375

2    5.750

3    2.625

dtype: float64

answered Nov 11 at 0:49

coldspeed

111k17101170

This can be easily done in 2 groupby steps:

v = df.groupby(['Category', 'Time']).Col1.agg(['count', 'sum'])

v['count'].mul(v['sum']).groupby(level=0).sum() / len(df)







Category

1    4.375

2    5.750

3    2.625

dtype: float64

answered Nov 11 at 0:49

coldspeed

111k17101170

answered Nov 11 at 0:49

coldspeed

111k17101170

answered Nov 11 at 0:49

coldspeed

111k17101170

answered Nov 11 at 0:49

coldspeed

111k17101170

But I need to have only 3 rows, one for each category. This formula must be applied for each category "[summation(sum of col1 for each time of each category) * (count of col1 for each time in each category)]/(total number of rows)" resulting in one value in col1.
– Mojojo
Nov 11 at 0:54

@Mojojo What's wrong with the answer here?
– coldspeed
Nov 11 at 1:00

add a comment |

But I need to have only 3 rows, one for each category. This formula must be applied for each category "[summation(sum of col1 for each time of each category) * (count of col1 for each time in each category)]/(total number of rows)" resulting in one value in col1.
– Mojojo
Nov 11 at 0:54

@Mojojo What's wrong with the answer here?
– coldspeed
Nov 11 at 1:00

But I need to have only 3 rows, one for each category. This formula must be applied for each category "[summation(sum of col1 for each time of each category) * (count of col1 for each time in each category)]/(total number of rows)" resulting in one value in col1.
– Mojojo
Nov 11 at 0:54

@Mojojo What's wrong with the answer here?
– coldspeed
Nov 11 at 1:00

add a comment |

up vote
0
down vote

Using transform with sum create the count , then we using Seriesgroupby get the result

s1=df.groupby(['ategory','Time']).Col1.transform('count')

(s1*df.Col1).groupby(df['ategory']).sum()/df.groupby('ategory').Col1.sum()

Out[631]: 

ategory

1    1.842105

2    1.769231

3    1.000000

Name: Col1, dtype: float64

answered Nov 11 at 2:15

W-B

94.3k72857

add a comment |

up vote
0
down vote

Using transform with sum create the count , then we using Seriesgroupby get the result

s1=df.groupby(['ategory','Time']).Col1.transform('count')

(s1*df.Col1).groupby(df['ategory']).sum()/df.groupby('ategory').Col1.sum()

Out[631]: 

ategory

1    1.842105

2    1.769231

3    1.000000

Name: Col1, dtype: float64

answered Nov 11 at 2:15

W-B

94.3k72857

add a comment |

up vote
0
down vote

Using transform with sum create the count , then we using Seriesgroupby get the result

s1=df.groupby(['ategory','Time']).Col1.transform('count')

(s1*df.Col1).groupby(df['ategory']).sum()/df.groupby('ategory').Col1.sum()

Out[631]: 

ategory

1    1.842105

2    1.769231

3    1.000000

Name: Col1, dtype: float64

answered Nov 11 at 2:15

W-B

94.3k72857

Using transform with sum create the count , then we using Seriesgroupby get the result

s1=df.groupby(['ategory','Time']).Col1.transform('count')

(s1*df.Col1).groupby(df['ategory']).sum()/df.groupby('ategory').Col1.sum()

Out[631]: 

ategory

1    1.842105

2    1.769231

3    1.000000

Name: Col1, dtype: float64

answered Nov 11 at 2:15

W-B

94.3k72857

answered Nov 11 at 2:15

W-B

94.3k72857

answered Nov 11 at 2:15

W-B

94.3k72857

answered Nov 11 at 2:15

W-B

94.3k72857

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

Some of your past answers have not been well-received, and you're in danger of being blocked from answering.

Please pay close attention to the following guidance:

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Ndtyjky