pandas data frame iterating over 2 index variables
up vote
1
down vote
favorite
I have a data frame with 2 indexes called "DATE"( it is monthly data) and "ID" and a column variable named Volume. Now I want to iterate over it and fill for every unique ID a new column with the average value of the column Volume in a new column.
The basic idea is to figure out which months are above the yearly avg for every ID.
list(df.index)
(Timestamp('1970-09-30 00:00:00'), 12167.0)
print(df.index.name)
None
I seemed to not find a tutorial to address this :(
Can someone please point me in the right direction
SHRCD EXCHCD SICCD PRC VOL RET SHROUT
DATE PERMNO
1970-08-31 10559.0 10.0 1.0 5311.0 35.000 1692.0 0.030657 12048.0
12626.0 10.0 1.0 5411.0 46.250 926.0 0.088235 6624.0
12749.0 11.0 1.0 5331.0 45.500 5632.0 0.126173 34685.0
13100.0 11.0 1.0 5311.0 22.000 1759.0 0.171242 15107.0
13653.0 10.0 1.0 5311.0 13.125 141.0 0.220930 1337.0
13936.0 11.0 1.0 2331.0 11.500 270.0 -0.053061 3942.0
14322.0 11.0 1.0 5311.0 64.750 6934.0 0.024409 154187.0
16969.0 10.0 1.0 5311.0 42.875 1069.0 0.186851 13828.0
17072.0 10.0 1.0 5311.0 14.750 777.0 0.026087 5415.0
17304.0 10.0 1.0 5311.0 24.875 1939.0 0.058511 8150.0
pandas dataframe indexing
|
show 1 more comment
up vote
1
down vote
favorite
I have a data frame with 2 indexes called "DATE"( it is monthly data) and "ID" and a column variable named Volume. Now I want to iterate over it and fill for every unique ID a new column with the average value of the column Volume in a new column.
The basic idea is to figure out which months are above the yearly avg for every ID.
list(df.index)
(Timestamp('1970-09-30 00:00:00'), 12167.0)
print(df.index.name)
None
I seemed to not find a tutorial to address this :(
Can someone please point me in the right direction
SHRCD EXCHCD SICCD PRC VOL RET SHROUT
DATE PERMNO
1970-08-31 10559.0 10.0 1.0 5311.0 35.000 1692.0 0.030657 12048.0
12626.0 10.0 1.0 5411.0 46.250 926.0 0.088235 6624.0
12749.0 11.0 1.0 5331.0 45.500 5632.0 0.126173 34685.0
13100.0 11.0 1.0 5311.0 22.000 1759.0 0.171242 15107.0
13653.0 10.0 1.0 5311.0 13.125 141.0 0.220930 1337.0
13936.0 11.0 1.0 2331.0 11.500 270.0 -0.053061 3942.0
14322.0 11.0 1.0 5311.0 64.750 6934.0 0.024409 154187.0
16969.0 10.0 1.0 5311.0 42.875 1069.0 0.186851 13828.0
17072.0 10.0 1.0 5311.0 14.750 777.0 0.026087 5415.0
17304.0 10.0 1.0 5311.0 24.875 1939.0 0.058511 8150.0
pandas dataframe indexing
Thank you so much, the problem is that I have not only groupby ID but also by the year of the 'DATE' index. meaning I have to somehow get the year out of it :(
– hmmmbob
Nov 11 at 5:23
Is possible create some sample data with expected output?
– jezrael
Nov 11 at 5:27
I hope i did that, i just want to for example for each PERMNO do the yearly avg of volume, so i need to access the DATE index, but I do not know how.
– hmmmbob
Nov 11 at 5:43
Do you thinkdf['avg'] = df.groupby([df.index.get_level_values(0).year, 'PERMNO'])['Volume'].transform('mean')
?
– jezrael
Nov 11 at 5:44
It does not throw an error so I hope it worked. I am just puzzled how you came up with index.get_level_values(0).year ..Can you tell me how you found that, so I can help myself in the future?
– hmmmbob
Nov 11 at 5:54
|
show 1 more comment
up vote
1
down vote
favorite
up vote
1
down vote
favorite
I have a data frame with 2 indexes called "DATE"( it is monthly data) and "ID" and a column variable named Volume. Now I want to iterate over it and fill for every unique ID a new column with the average value of the column Volume in a new column.
The basic idea is to figure out which months are above the yearly avg for every ID.
list(df.index)
(Timestamp('1970-09-30 00:00:00'), 12167.0)
print(df.index.name)
None
I seemed to not find a tutorial to address this :(
Can someone please point me in the right direction
SHRCD EXCHCD SICCD PRC VOL RET SHROUT
DATE PERMNO
1970-08-31 10559.0 10.0 1.0 5311.0 35.000 1692.0 0.030657 12048.0
12626.0 10.0 1.0 5411.0 46.250 926.0 0.088235 6624.0
12749.0 11.0 1.0 5331.0 45.500 5632.0 0.126173 34685.0
13100.0 11.0 1.0 5311.0 22.000 1759.0 0.171242 15107.0
13653.0 10.0 1.0 5311.0 13.125 141.0 0.220930 1337.0
13936.0 11.0 1.0 2331.0 11.500 270.0 -0.053061 3942.0
14322.0 11.0 1.0 5311.0 64.750 6934.0 0.024409 154187.0
16969.0 10.0 1.0 5311.0 42.875 1069.0 0.186851 13828.0
17072.0 10.0 1.0 5311.0 14.750 777.0 0.026087 5415.0
17304.0 10.0 1.0 5311.0 24.875 1939.0 0.058511 8150.0
pandas dataframe indexing
I have a data frame with 2 indexes called "DATE"( it is monthly data) and "ID" and a column variable named Volume. Now I want to iterate over it and fill for every unique ID a new column with the average value of the column Volume in a new column.
The basic idea is to figure out which months are above the yearly avg for every ID.
list(df.index)
(Timestamp('1970-09-30 00:00:00'), 12167.0)
print(df.index.name)
None
I seemed to not find a tutorial to address this :(
Can someone please point me in the right direction
SHRCD EXCHCD SICCD PRC VOL RET SHROUT
DATE PERMNO
1970-08-31 10559.0 10.0 1.0 5311.0 35.000 1692.0 0.030657 12048.0
12626.0 10.0 1.0 5411.0 46.250 926.0 0.088235 6624.0
12749.0 11.0 1.0 5331.0 45.500 5632.0 0.126173 34685.0
13100.0 11.0 1.0 5311.0 22.000 1759.0 0.171242 15107.0
13653.0 10.0 1.0 5311.0 13.125 141.0 0.220930 1337.0
13936.0 11.0 1.0 2331.0 11.500 270.0 -0.053061 3942.0
14322.0 11.0 1.0 5311.0 64.750 6934.0 0.024409 154187.0
16969.0 10.0 1.0 5311.0 42.875 1069.0 0.186851 13828.0
17072.0 10.0 1.0 5311.0 14.750 777.0 0.026087 5415.0
17304.0 10.0 1.0 5311.0 24.875 1939.0 0.058511 8150.0
pandas dataframe indexing
pandas dataframe indexing
edited Nov 11 at 5:42
asked Nov 10 at 19:28
hmmmbob
412921
412921
Thank you so much, the problem is that I have not only groupby ID but also by the year of the 'DATE' index. meaning I have to somehow get the year out of it :(
– hmmmbob
Nov 11 at 5:23
Is possible create some sample data with expected output?
– jezrael
Nov 11 at 5:27
I hope i did that, i just want to for example for each PERMNO do the yearly avg of volume, so i need to access the DATE index, but I do not know how.
– hmmmbob
Nov 11 at 5:43
Do you thinkdf['avg'] = df.groupby([df.index.get_level_values(0).year, 'PERMNO'])['Volume'].transform('mean')
?
– jezrael
Nov 11 at 5:44
It does not throw an error so I hope it worked. I am just puzzled how you came up with index.get_level_values(0).year ..Can you tell me how you found that, so I can help myself in the future?
– hmmmbob
Nov 11 at 5:54
|
show 1 more comment
Thank you so much, the problem is that I have not only groupby ID but also by the year of the 'DATE' index. meaning I have to somehow get the year out of it :(
– hmmmbob
Nov 11 at 5:23
Is possible create some sample data with expected output?
– jezrael
Nov 11 at 5:27
I hope i did that, i just want to for example for each PERMNO do the yearly avg of volume, so i need to access the DATE index, but I do not know how.
– hmmmbob
Nov 11 at 5:43
Do you thinkdf['avg'] = df.groupby([df.index.get_level_values(0).year, 'PERMNO'])['Volume'].transform('mean')
?
– jezrael
Nov 11 at 5:44
It does not throw an error so I hope it worked. I am just puzzled how you came up with index.get_level_values(0).year ..Can you tell me how you found that, so I can help myself in the future?
– hmmmbob
Nov 11 at 5:54
Thank you so much, the problem is that I have not only groupby ID but also by the year of the 'DATE' index. meaning I have to somehow get the year out of it :(
– hmmmbob
Nov 11 at 5:23
Thank you so much, the problem is that I have not only groupby ID but also by the year of the 'DATE' index. meaning I have to somehow get the year out of it :(
– hmmmbob
Nov 11 at 5:23
Is possible create some sample data with expected output?
– jezrael
Nov 11 at 5:27
Is possible create some sample data with expected output?
– jezrael
Nov 11 at 5:27
I hope i did that, i just want to for example for each PERMNO do the yearly avg of volume, so i need to access the DATE index, but I do not know how.
– hmmmbob
Nov 11 at 5:43
I hope i did that, i just want to for example for each PERMNO do the yearly avg of volume, so i need to access the DATE index, but I do not know how.
– hmmmbob
Nov 11 at 5:43
Do you think
df['avg'] = df.groupby([df.index.get_level_values(0).year, 'PERMNO'])['Volume'].transform('mean')
?– jezrael
Nov 11 at 5:44
Do you think
df['avg'] = df.groupby([df.index.get_level_values(0).year, 'PERMNO'])['Volume'].transform('mean')
?– jezrael
Nov 11 at 5:44
It does not throw an error so I hope it worked. I am just puzzled how you came up with index.get_level_values(0).year ..Can you tell me how you found that, so I can help myself in the future?
– hmmmbob
Nov 11 at 5:54
It does not throw an error so I hope it worked. I am just puzzled how you came up with index.get_level_values(0).year ..Can you tell me how you found that, so I can help myself in the future?
– hmmmbob
Nov 11 at 5:54
|
show 1 more comment
1 Answer
1
active
oldest
votes
up vote
1
down vote
accepted
You can use transform
with year
for same size Series like original DataFrame
:
print (df)
VOL
DATE PERMNO
1970-08-31 10559.0 1
10559.0 2
12749.0 3
1971-08-31 13100.0 4
13100.0 5
df['avg'] = df.groupby([df.index.get_level_values(0).year, 'PERMNO'])['VOL'].transform('mean')
print (df)
VOL avg
DATE PERMNO
1970-08-31 10559.0 1 1.5
10559.0 2 1.5
12749.0 3 3.0
1971-08-31 13100.0 4 4.5
13100.0 5 4.5
add a comment |
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
1
down vote
accepted
You can use transform
with year
for same size Series like original DataFrame
:
print (df)
VOL
DATE PERMNO
1970-08-31 10559.0 1
10559.0 2
12749.0 3
1971-08-31 13100.0 4
13100.0 5
df['avg'] = df.groupby([df.index.get_level_values(0).year, 'PERMNO'])['VOL'].transform('mean')
print (df)
VOL avg
DATE PERMNO
1970-08-31 10559.0 1 1.5
10559.0 2 1.5
12749.0 3 3.0
1971-08-31 13100.0 4 4.5
13100.0 5 4.5
add a comment |
up vote
1
down vote
accepted
You can use transform
with year
for same size Series like original DataFrame
:
print (df)
VOL
DATE PERMNO
1970-08-31 10559.0 1
10559.0 2
12749.0 3
1971-08-31 13100.0 4
13100.0 5
df['avg'] = df.groupby([df.index.get_level_values(0).year, 'PERMNO'])['VOL'].transform('mean')
print (df)
VOL avg
DATE PERMNO
1970-08-31 10559.0 1 1.5
10559.0 2 1.5
12749.0 3 3.0
1971-08-31 13100.0 4 4.5
13100.0 5 4.5
add a comment |
up vote
1
down vote
accepted
up vote
1
down vote
accepted
You can use transform
with year
for same size Series like original DataFrame
:
print (df)
VOL
DATE PERMNO
1970-08-31 10559.0 1
10559.0 2
12749.0 3
1971-08-31 13100.0 4
13100.0 5
df['avg'] = df.groupby([df.index.get_level_values(0).year, 'PERMNO'])['VOL'].transform('mean')
print (df)
VOL avg
DATE PERMNO
1970-08-31 10559.0 1 1.5
10559.0 2 1.5
12749.0 3 3.0
1971-08-31 13100.0 4 4.5
13100.0 5 4.5
You can use transform
with year
for same size Series like original DataFrame
:
print (df)
VOL
DATE PERMNO
1970-08-31 10559.0 1
10559.0 2
12749.0 3
1971-08-31 13100.0 4
13100.0 5
df['avg'] = df.groupby([df.index.get_level_values(0).year, 'PERMNO'])['VOL'].transform('mean')
print (df)
VOL avg
DATE PERMNO
1970-08-31 10559.0 1 1.5
10559.0 2 1.5
12749.0 3 3.0
1971-08-31 13100.0 4 4.5
13100.0 5 4.5
answered Nov 11 at 6:10
jezrael
307k20243317
307k20243317
add a comment |
add a comment |
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53242644%2fpandas-data-frame-iterating-over-2-index-variables%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Thank you so much, the problem is that I have not only groupby ID but also by the year of the 'DATE' index. meaning I have to somehow get the year out of it :(
– hmmmbob
Nov 11 at 5:23
Is possible create some sample data with expected output?
– jezrael
Nov 11 at 5:27
I hope i did that, i just want to for example for each PERMNO do the yearly avg of volume, so i need to access the DATE index, but I do not know how.
– hmmmbob
Nov 11 at 5:43
Do you think
df['avg'] = df.groupby([df.index.get_level_values(0).year, 'PERMNO'])['Volume'].transform('mean')
?– jezrael
Nov 11 at 5:44
It does not throw an error so I hope it worked. I am just puzzled how you came up with index.get_level_values(0).year ..Can you tell me how you found that, so I can help myself in the future?
– hmmmbob
Nov 11 at 5:54