Applying groupby twice on pandas dataframe
up vote
1
down vote
favorite
I am storing a huge .csv file in a pandas data frame. The structure of the table is something like this
Category Time Col1
1 00:00 3
1 01:00 6
1 01:00 10
2 02:00 8
2 02:00 12
2 03:00 6
3 04:00 13
3 05:00 8
I want to find the following for every category
[summation(sum of col1 for each time of each category) * (count of col1 for each time in each category)]/(total number of rows) for each
category.
So basically I'm trying to apply group by once on category and then in every category, I want to apply group by again on Time and
compute as above.
So for the above example, my output should look like
Category Col1
1 [3 + (2 * (6 + 10))] / 8
2 [(2 * (8 + 12)) + 6] / 8
3 [13 + 8] / 8
python python-3.x pandas pandas-groupby
add a comment |
up vote
1
down vote
favorite
I am storing a huge .csv file in a pandas data frame. The structure of the table is something like this
Category Time Col1
1 00:00 3
1 01:00 6
1 01:00 10
2 02:00 8
2 02:00 12
2 03:00 6
3 04:00 13
3 05:00 8
I want to find the following for every category
[summation(sum of col1 for each time of each category) * (count of col1 for each time in each category)]/(total number of rows) for each
category.
So basically I'm trying to apply group by once on category and then in every category, I want to apply group by again on Time and
compute as above.
So for the above example, my output should look like
Category Col1
1 [3 + (2 * (6 + 10))] / 8
2 [(2 * (8 + 12)) + 6] / 8
3 [13 + 8] / 8
python python-3.x pandas pandas-groupby
add a comment |
up vote
1
down vote
favorite
up vote
1
down vote
favorite
I am storing a huge .csv file in a pandas data frame. The structure of the table is something like this
Category Time Col1
1 00:00 3
1 01:00 6
1 01:00 10
2 02:00 8
2 02:00 12
2 03:00 6
3 04:00 13
3 05:00 8
I want to find the following for every category
[summation(sum of col1 for each time of each category) * (count of col1 for each time in each category)]/(total number of rows) for each
category.
So basically I'm trying to apply group by once on category and then in every category, I want to apply group by again on Time and
compute as above.
So for the above example, my output should look like
Category Col1
1 [3 + (2 * (6 + 10))] / 8
2 [(2 * (8 + 12)) + 6] / 8
3 [13 + 8] / 8
python python-3.x pandas pandas-groupby
I am storing a huge .csv file in a pandas data frame. The structure of the table is something like this
Category Time Col1
1 00:00 3
1 01:00 6
1 01:00 10
2 02:00 8
2 02:00 12
2 03:00 6
3 04:00 13
3 05:00 8
I want to find the following for every category
[summation(sum of col1 for each time of each category) * (count of col1 for each time in each category)]/(total number of rows) for each
category.
So basically I'm trying to apply group by once on category and then in every category, I want to apply group by again on Time and
compute as above.
So for the above example, my output should look like
Category Col1
1 [3 + (2 * (6 + 10))] / 8
2 [(2 * (8 + 12)) + 6] / 8
3 [13 + 8] / 8
python python-3.x pandas pandas-groupby
python python-3.x pandas pandas-groupby
edited Nov 11 at 0:46
coldspeed
111k17101170
111k17101170
asked Nov 11 at 0:40
Mojojo
62
62
add a comment |
add a comment |
2 Answers
2
active
oldest
votes
up vote
1
down vote
This can be easily done in 2 groupby
steps:
v = df.groupby(['Category', 'Time']).Col1.agg(['count', 'sum'])
v['count'].mul(v['sum']).groupby(level=0).sum() / len(df)
Category
1 4.375
2 5.750
3 2.625
dtype: float64
But I need to have only 3 rows, one for each category. This formula must be applied for each category "[summation(sum of col1 for each time of each category) * (count of col1 for each time in each category)]/(total number of rows)" resulting in one value in col1.
– Mojojo
Nov 11 at 0:54
@Mojojo What's wrong with the answer here?
– coldspeed
Nov 11 at 1:00
add a comment |
up vote
0
down vote
Using transform
with sum
create the count
, then we using Seriesgroupby
get the result
s1=df.groupby(['ategory','Time']).Col1.transform('count')
(s1*df.Col1).groupby(df['ategory']).sum()/df.groupby('ategory').Col1.sum()
Out[631]:
ategory
1 1.842105
2 1.769231
3 1.000000
Name: Col1, dtype: float64
add a comment |
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
1
down vote
This can be easily done in 2 groupby
steps:
v = df.groupby(['Category', 'Time']).Col1.agg(['count', 'sum'])
v['count'].mul(v['sum']).groupby(level=0).sum() / len(df)
Category
1 4.375
2 5.750
3 2.625
dtype: float64
But I need to have only 3 rows, one for each category. This formula must be applied for each category "[summation(sum of col1 for each time of each category) * (count of col1 for each time in each category)]/(total number of rows)" resulting in one value in col1.
– Mojojo
Nov 11 at 0:54
@Mojojo What's wrong with the answer here?
– coldspeed
Nov 11 at 1:00
add a comment |
up vote
1
down vote
This can be easily done in 2 groupby
steps:
v = df.groupby(['Category', 'Time']).Col1.agg(['count', 'sum'])
v['count'].mul(v['sum']).groupby(level=0).sum() / len(df)
Category
1 4.375
2 5.750
3 2.625
dtype: float64
But I need to have only 3 rows, one for each category. This formula must be applied for each category "[summation(sum of col1 for each time of each category) * (count of col1 for each time in each category)]/(total number of rows)" resulting in one value in col1.
– Mojojo
Nov 11 at 0:54
@Mojojo What's wrong with the answer here?
– coldspeed
Nov 11 at 1:00
add a comment |
up vote
1
down vote
up vote
1
down vote
This can be easily done in 2 groupby
steps:
v = df.groupby(['Category', 'Time']).Col1.agg(['count', 'sum'])
v['count'].mul(v['sum']).groupby(level=0).sum() / len(df)
Category
1 4.375
2 5.750
3 2.625
dtype: float64
This can be easily done in 2 groupby
steps:
v = df.groupby(['Category', 'Time']).Col1.agg(['count', 'sum'])
v['count'].mul(v['sum']).groupby(level=0).sum() / len(df)
Category
1 4.375
2 5.750
3 2.625
dtype: float64
answered Nov 11 at 0:49
coldspeed
111k17101170
111k17101170
But I need to have only 3 rows, one for each category. This formula must be applied for each category "[summation(sum of col1 for each time of each category) * (count of col1 for each time in each category)]/(total number of rows)" resulting in one value in col1.
– Mojojo
Nov 11 at 0:54
@Mojojo What's wrong with the answer here?
– coldspeed
Nov 11 at 1:00
add a comment |
But I need to have only 3 rows, one for each category. This formula must be applied for each category "[summation(sum of col1 for each time of each category) * (count of col1 for each time in each category)]/(total number of rows)" resulting in one value in col1.
– Mojojo
Nov 11 at 0:54
@Mojojo What's wrong with the answer here?
– coldspeed
Nov 11 at 1:00
But I need to have only 3 rows, one for each category. This formula must be applied for each category "[summation(sum of col1 for each time of each category) * (count of col1 for each time in each category)]/(total number of rows)" resulting in one value in col1.
– Mojojo
Nov 11 at 0:54
But I need to have only 3 rows, one for each category. This formula must be applied for each category "[summation(sum of col1 for each time of each category) * (count of col1 for each time in each category)]/(total number of rows)" resulting in one value in col1.
– Mojojo
Nov 11 at 0:54
@Mojojo What's wrong with the answer here?
– coldspeed
Nov 11 at 1:00
@Mojojo What's wrong with the answer here?
– coldspeed
Nov 11 at 1:00
add a comment |
up vote
0
down vote
Using transform
with sum
create the count
, then we using Seriesgroupby
get the result
s1=df.groupby(['ategory','Time']).Col1.transform('count')
(s1*df.Col1).groupby(df['ategory']).sum()/df.groupby('ategory').Col1.sum()
Out[631]:
ategory
1 1.842105
2 1.769231
3 1.000000
Name: Col1, dtype: float64
add a comment |
up vote
0
down vote
Using transform
with sum
create the count
, then we using Seriesgroupby
get the result
s1=df.groupby(['ategory','Time']).Col1.transform('count')
(s1*df.Col1).groupby(df['ategory']).sum()/df.groupby('ategory').Col1.sum()
Out[631]:
ategory
1 1.842105
2 1.769231
3 1.000000
Name: Col1, dtype: float64
add a comment |
up vote
0
down vote
up vote
0
down vote
Using transform
with sum
create the count
, then we using Seriesgroupby
get the result
s1=df.groupby(['ategory','Time']).Col1.transform('count')
(s1*df.Col1).groupby(df['ategory']).sum()/df.groupby('ategory').Col1.sum()
Out[631]:
ategory
1 1.842105
2 1.769231
3 1.000000
Name: Col1, dtype: float64
Using transform
with sum
create the count
, then we using Seriesgroupby
get the result
s1=df.groupby(['ategory','Time']).Col1.transform('count')
(s1*df.Col1).groupby(df['ategory']).sum()/df.groupby('ategory').Col1.sum()
Out[631]:
ategory
1 1.842105
2 1.769231
3 1.000000
Name: Col1, dtype: float64
answered Nov 11 at 2:15
W-B
94.3k72857
94.3k72857
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53244815%2fapplying-groupby-twice-on-pandas-dataframe%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown