pandas data frame iterating over 2 index variables

up vote
1
down vote

favorite

I have a data frame with 2 indexes called "DATE"( it is monthly data) and "ID" and a column variable named Volume. Now I want to iterate over it and fill for every unique ID a new column with the average value of the column Volume in a new column.

The basic idea is to figure out which months are above the yearly avg for every ID.

list(df.index)



(Timestamp('1970-09-30 00:00:00'), 12167.0)



print(df.index.name)



None

I seemed to not find a tutorial to address this :(

Can someone please point me in the right direction

                    SHRCD  EXCHCD   SICCD     PRC     VOL       RET    SHROUT  

DATE       PERMNO                                                               

1970-08-31 10559.0   10.0     1.0  5311.0  35.000  1692.0  0.030657   12048.0   

           12626.0   10.0     1.0  5411.0  46.250   926.0  0.088235    6624.0   

           12749.0   11.0     1.0  5331.0  45.500  5632.0  0.126173   34685.0   

           13100.0   11.0     1.0  5311.0  22.000  1759.0  0.171242   15107.0   

           13653.0   10.0     1.0  5311.0  13.125   141.0  0.220930    1337.0   

           13936.0   11.0     1.0  2331.0  11.500   270.0 -0.053061    3942.0   

           14322.0   11.0     1.0  5311.0  64.750  6934.0  0.024409  154187.0   

           16969.0   10.0     1.0  5311.0  42.875  1069.0  0.186851   13828.0   

           17072.0   10.0     1.0  5311.0  14.750   777.0  0.026087    5415.0   

           17304.0   10.0     1.0  5311.0  24.875  1939.0  0.058511    8150.0

edited Nov 11 at 5:42

asked Nov 10 at 19:28

hmmmbob

412921

Thank you so much, the problem is that I have not only groupby ID but also by the year of the 'DATE' index. meaning I have to somehow get the year out of it :(
– hmmmbob
Nov 11 at 5:23

Is possible create some sample data with expected output?
– jezrael
Nov 11 at 5:27

I hope i did that, i just want to for example for each PERMNO do the yearly avg of volume, so i need to access the DATE index, but I do not know how.
– hmmmbob
Nov 11 at 5:43

Do you think df['avg'] = df.groupby([df.index.get_level_values(0).year, 'PERMNO'])['Volume'].transform('mean') ?
– jezrael
Nov 11 at 5:44

It does not throw an error so I hope it worked. I am just puzzled how you came up with index.get_level_values(0).year ..Can you tell me how you found that, so I can help myself in the future?
– hmmmbob
Nov 11 at 5:54

|
show 1 more comment

up vote
1
down vote

favorite

The basic idea is to figure out which months are above the yearly avg for every ID.

list(df.index)



(Timestamp('1970-09-30 00:00:00'), 12167.0)



print(df.index.name)



None

I seemed to not find a tutorial to address this :(

Can someone please point me in the right direction

                    SHRCD  EXCHCD   SICCD     PRC     VOL       RET    SHROUT  

DATE       PERMNO                                                               

1970-08-31 10559.0   10.0     1.0  5311.0  35.000  1692.0  0.030657   12048.0   

           12626.0   10.0     1.0  5411.0  46.250   926.0  0.088235    6624.0   

           12749.0   11.0     1.0  5331.0  45.500  5632.0  0.126173   34685.0   

           13100.0   11.0     1.0  5311.0  22.000  1759.0  0.171242   15107.0   

           13653.0   10.0     1.0  5311.0  13.125   141.0  0.220930    1337.0   

           13936.0   11.0     1.0  2331.0  11.500   270.0 -0.053061    3942.0   

           14322.0   11.0     1.0  5311.0  64.750  6934.0  0.024409  154187.0   

           16969.0   10.0     1.0  5311.0  42.875  1069.0  0.186851   13828.0   

           17072.0   10.0     1.0  5311.0  14.750   777.0  0.026087    5415.0   

           17304.0   10.0     1.0  5311.0  24.875  1939.0  0.058511    8150.0

edited Nov 11 at 5:42

asked Nov 10 at 19:28

hmmmbob

412921

Thank you so much, the problem is that I have not only groupby ID but also by the year of the 'DATE' index. meaning I have to somehow get the year out of it :(
– hmmmbob
Nov 11 at 5:23

Is possible create some sample data with expected output?
– jezrael
Nov 11 at 5:27

I hope i did that, i just want to for example for each PERMNO do the yearly avg of volume, so i need to access the DATE index, but I do not know how.
– hmmmbob
Nov 11 at 5:43

Do you think df['avg'] = df.groupby([df.index.get_level_values(0).year, 'PERMNO'])['Volume'].transform('mean') ?
– jezrael
Nov 11 at 5:44

It does not throw an error so I hope it worked. I am just puzzled how you came up with index.get_level_values(0).year ..Can you tell me how you found that, so I can help myself in the future?
– hmmmbob
Nov 11 at 5:54

|
show 1 more comment

up vote
1
down vote

favorite

The basic idea is to figure out which months are above the yearly avg for every ID.

list(df.index)



(Timestamp('1970-09-30 00:00:00'), 12167.0)



print(df.index.name)



None

I seemed to not find a tutorial to address this :(

Can someone please point me in the right direction

                    SHRCD  EXCHCD   SICCD     PRC     VOL       RET    SHROUT  

DATE       PERMNO                                                               

1970-08-31 10559.0   10.0     1.0  5311.0  35.000  1692.0  0.030657   12048.0   

           12626.0   10.0     1.0  5411.0  46.250   926.0  0.088235    6624.0   

           12749.0   11.0     1.0  5331.0  45.500  5632.0  0.126173   34685.0   

           13100.0   11.0     1.0  5311.0  22.000  1759.0  0.171242   15107.0   

           13653.0   10.0     1.0  5311.0  13.125   141.0  0.220930    1337.0   

           13936.0   11.0     1.0  2331.0  11.500   270.0 -0.053061    3942.0   

           14322.0   11.0     1.0  5311.0  64.750  6934.0  0.024409  154187.0   

           16969.0   10.0     1.0  5311.0  42.875  1069.0  0.186851   13828.0   

           17072.0   10.0     1.0  5311.0  14.750   777.0  0.026087    5415.0   

           17304.0   10.0     1.0  5311.0  24.875  1939.0  0.058511    8150.0

edited Nov 11 at 5:42

asked Nov 10 at 19:28

hmmmbob

412921

The basic idea is to figure out which months are above the yearly avg for every ID.

list(df.index)



(Timestamp('1970-09-30 00:00:00'), 12167.0)



print(df.index.name)



None

I seemed to not find a tutorial to address this :(

Can someone please point me in the right direction

                    SHRCD  EXCHCD   SICCD     PRC     VOL       RET    SHROUT  

DATE       PERMNO                                                               

1970-08-31 10559.0   10.0     1.0  5311.0  35.000  1692.0  0.030657   12048.0   

           12626.0   10.0     1.0  5411.0  46.250   926.0  0.088235    6624.0   

           12749.0   11.0     1.0  5331.0  45.500  5632.0  0.126173   34685.0   

           13100.0   11.0     1.0  5311.0  22.000  1759.0  0.171242   15107.0   

           13653.0   10.0     1.0  5311.0  13.125   141.0  0.220930    1337.0   

           13936.0   11.0     1.0  2331.0  11.500   270.0 -0.053061    3942.0   

           14322.0   11.0     1.0  5311.0  64.750  6934.0  0.024409  154187.0   

           16969.0   10.0     1.0  5311.0  42.875  1069.0  0.186851   13828.0   

           17072.0   10.0     1.0  5311.0  14.750   777.0  0.026087    5415.0   

           17304.0   10.0     1.0  5311.0  24.875  1939.0  0.058511    8150.0

pandas dataframe indexing

edited Nov 11 at 5:42

asked Nov 10 at 19:28

hmmmbob

412921

edited Nov 11 at 5:42

asked Nov 10 at 19:28

hmmmbob

412921

edited Nov 11 at 5:42

asked Nov 10 at 19:28

hmmmbob

412921

asked Nov 10 at 19:28

hmmmbob

412921

asked Nov 10 at 19:28

hmmmbob

412921

Thank you so much, the problem is that I have not only groupby ID but also by the year of the 'DATE' index. meaning I have to somehow get the year out of it :(
– hmmmbob
Nov 11 at 5:23

Is possible create some sample data with expected output?
– jezrael
Nov 11 at 5:27

I hope i did that, i just want to for example for each PERMNO do the yearly avg of volume, so i need to access the DATE index, but I do not know how.
– hmmmbob
Nov 11 at 5:43

Do you think df['avg'] = df.groupby([df.index.get_level_values(0).year, 'PERMNO'])['Volume'].transform('mean') ?
– jezrael
Nov 11 at 5:44

It does not throw an error so I hope it worked. I am just puzzled how you came up with index.get_level_values(0).year ..Can you tell me how you found that, so I can help myself in the future?
– hmmmbob
Nov 11 at 5:54

|
show 1 more comment

Thank you so much, the problem is that I have not only groupby ID but also by the year of the 'DATE' index. meaning I have to somehow get the year out of it :(
– hmmmbob
Nov 11 at 5:23

Is possible create some sample data with expected output?
– jezrael
Nov 11 at 5:27

I hope i did that, i just want to for example for each PERMNO do the yearly avg of volume, so i need to access the DATE index, but I do not know how.
– hmmmbob
Nov 11 at 5:43

Do you think df['avg'] = df.groupby([df.index.get_level_values(0).year, 'PERMNO'])['Volume'].transform('mean') ?
– jezrael
Nov 11 at 5:44

It does not throw an error so I hope it worked. I am just puzzled how you came up with index.get_level_values(0).year ..Can you tell me how you found that, so I can help myself in the future?
– hmmmbob
Nov 11 at 5:54

Thank you so much, the problem is that I have not only groupby ID but also by the year of the 'DATE' index. meaning I have to somehow get the year out of it :(
– hmmmbob
Nov 11 at 5:23

Is possible create some sample data with expected output?
– jezrael
Nov 11 at 5:27

I hope i did that, i just want to for example for each PERMNO do the yearly avg of volume, so i need to access the DATE index, but I do not know how.
– hmmmbob
Nov 11 at 5:43

Do you think df['avg'] = df.groupby([df.index.get_level_values(0).year, 'PERMNO'])['Volume'].transform('mean') ?
– jezrael
Nov 11 at 5:44

It does not throw an error so I hope it worked. I am just puzzled how you came up with index.get_level_values(0).year ..Can you tell me how you found that, so I can help myself in the future?
– hmmmbob
Nov 11 at 5:54

|
show 1 more comment

1 Answer
1

active

oldest

votes

up vote
1
down vote

accepted

You can use transform with year for same size Series like original DataFrame:

print (df)

                    VOL

DATE       PERMNO      

1970-08-31 10559.0    1

           10559.0    2

           12749.0    3

1971-08-31 13100.0    4

           13100.0    5



df['avg'] = df.groupby([df.index.get_level_values(0).year, 'PERMNO'])['VOL'].transform('mean')

print (df)

                    VOL  avg

DATE       PERMNO           

1970-08-31 10559.0    1  1.5

           10559.0    2  1.5

           12749.0    3  3.0

1971-08-31 13100.0    4  4.5

           13100.0    5  4.5

answered Nov 11 at 6:10

jezrael

307k20243317

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53242644%2fpandas-data-frame-iterating-over-2-index-variables%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

up vote
1
down vote

accepted

You can use transform with year for same size Series like original DataFrame:

print (df)

                    VOL

DATE       PERMNO      

1970-08-31 10559.0    1

           10559.0    2

           12749.0    3

1971-08-31 13100.0    4

           13100.0    5



df['avg'] = df.groupby([df.index.get_level_values(0).year, 'PERMNO'])['VOL'].transform('mean')

print (df)

                    VOL  avg

DATE       PERMNO           

1970-08-31 10559.0    1  1.5

           10559.0    2  1.5

           12749.0    3  3.0

1971-08-31 13100.0    4  4.5

           13100.0    5  4.5

answered Nov 11 at 6:10

jezrael

307k20243317

add a comment |

up vote
1
down vote

accepted

You can use transform with year for same size Series like original DataFrame:

print (df)

                    VOL

DATE       PERMNO      

1970-08-31 10559.0    1

           10559.0    2

           12749.0    3

1971-08-31 13100.0    4

           13100.0    5



df['avg'] = df.groupby([df.index.get_level_values(0).year, 'PERMNO'])['VOL'].transform('mean')

print (df)

                    VOL  avg

DATE       PERMNO           

1970-08-31 10559.0    1  1.5

           10559.0    2  1.5

           12749.0    3  3.0

1971-08-31 13100.0    4  4.5

           13100.0    5  4.5

answered Nov 11 at 6:10

jezrael

307k20243317

add a comment |

up vote
1
down vote

accepted

You can use transform with year for same size Series like original DataFrame:

print (df)

                    VOL

DATE       PERMNO      

1970-08-31 10559.0    1

           10559.0    2

           12749.0    3

1971-08-31 13100.0    4

           13100.0    5



df['avg'] = df.groupby([df.index.get_level_values(0).year, 'PERMNO'])['VOL'].transform('mean')

print (df)

                    VOL  avg

DATE       PERMNO           

1970-08-31 10559.0    1  1.5

           10559.0    2  1.5

           12749.0    3  3.0

1971-08-31 13100.0    4  4.5

           13100.0    5  4.5

answered Nov 11 at 6:10

jezrael

307k20243317

You can use transform with year for same size Series like original DataFrame:

print (df)

                    VOL

DATE       PERMNO      

1970-08-31 10559.0    1

           10559.0    2

           12749.0    3

1971-08-31 13100.0    4

           13100.0    5



df['avg'] = df.groupby([df.index.get_level_values(0).year, 'PERMNO'])['VOL'].transform('mean')

print (df)

                    VOL  avg

DATE       PERMNO           

1970-08-31 10559.0    1  1.5

           10559.0    2  1.5

           12749.0    3  3.0

1971-08-31 13100.0    4  4.5

           13100.0    5  4.5

answered Nov 11 at 6:10

jezrael

307k20243317

answered Nov 11 at 6:10

jezrael

307k20243317

answered Nov 11 at 6:10

jezrael

307k20243317

answered Nov 11 at 6:10

jezrael

307k20243317

add a comment |

draft saved

draft discarded

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Ndtyjky