pandas data frame iterating over 2 index variables











up vote
1
down vote

favorite












I have a data frame with 2 indexes called "DATE"( it is monthly data) and "ID" and a column variable named Volume. Now I want to iterate over it and fill for every unique ID a new column with the average value of the column Volume in a new column.



The basic idea is to figure out which months are above the yearly avg for every ID.



list(df.index)

(Timestamp('1970-09-30 00:00:00'), 12167.0)

print(df.index.name)

None


I seemed to not find a tutorial to address this :(



Can someone please point me in the right direction



                    SHRCD  EXCHCD   SICCD     PRC     VOL       RET    SHROUT  
DATE PERMNO
1970-08-31 10559.0 10.0 1.0 5311.0 35.000 1692.0 0.030657 12048.0
12626.0 10.0 1.0 5411.0 46.250 926.0 0.088235 6624.0
12749.0 11.0 1.0 5331.0 45.500 5632.0 0.126173 34685.0
13100.0 11.0 1.0 5311.0 22.000 1759.0 0.171242 15107.0
13653.0 10.0 1.0 5311.0 13.125 141.0 0.220930 1337.0
13936.0 11.0 1.0 2331.0 11.500 270.0 -0.053061 3942.0
14322.0 11.0 1.0 5311.0 64.750 6934.0 0.024409 154187.0
16969.0 10.0 1.0 5311.0 42.875 1069.0 0.186851 13828.0
17072.0 10.0 1.0 5311.0 14.750 777.0 0.026087 5415.0
17304.0 10.0 1.0 5311.0 24.875 1939.0 0.058511 8150.0









share|improve this question
























  • Thank you so much, the problem is that I have not only groupby ID but also by the year of the 'DATE' index. meaning I have to somehow get the year out of it :(
    – hmmmbob
    Nov 11 at 5:23










  • Is possible create some sample data with expected output?
    – jezrael
    Nov 11 at 5:27










  • I hope i did that, i just want to for example for each PERMNO do the yearly avg of volume, so i need to access the DATE index, but I do not know how.
    – hmmmbob
    Nov 11 at 5:43










  • Do you think df['avg'] = df.groupby([df.index.get_level_values(0).year, 'PERMNO'])['Volume'].transform('mean') ?
    – jezrael
    Nov 11 at 5:44












  • It does not throw an error so I hope it worked. I am just puzzled how you came up with index.get_level_values(0).year ..Can you tell me how you found that, so I can help myself in the future?
    – hmmmbob
    Nov 11 at 5:54















up vote
1
down vote

favorite












I have a data frame with 2 indexes called "DATE"( it is monthly data) and "ID" and a column variable named Volume. Now I want to iterate over it and fill for every unique ID a new column with the average value of the column Volume in a new column.



The basic idea is to figure out which months are above the yearly avg for every ID.



list(df.index)

(Timestamp('1970-09-30 00:00:00'), 12167.0)

print(df.index.name)

None


I seemed to not find a tutorial to address this :(



Can someone please point me in the right direction



                    SHRCD  EXCHCD   SICCD     PRC     VOL       RET    SHROUT  
DATE PERMNO
1970-08-31 10559.0 10.0 1.0 5311.0 35.000 1692.0 0.030657 12048.0
12626.0 10.0 1.0 5411.0 46.250 926.0 0.088235 6624.0
12749.0 11.0 1.0 5331.0 45.500 5632.0 0.126173 34685.0
13100.0 11.0 1.0 5311.0 22.000 1759.0 0.171242 15107.0
13653.0 10.0 1.0 5311.0 13.125 141.0 0.220930 1337.0
13936.0 11.0 1.0 2331.0 11.500 270.0 -0.053061 3942.0
14322.0 11.0 1.0 5311.0 64.750 6934.0 0.024409 154187.0
16969.0 10.0 1.0 5311.0 42.875 1069.0 0.186851 13828.0
17072.0 10.0 1.0 5311.0 14.750 777.0 0.026087 5415.0
17304.0 10.0 1.0 5311.0 24.875 1939.0 0.058511 8150.0









share|improve this question
























  • Thank you so much, the problem is that I have not only groupby ID but also by the year of the 'DATE' index. meaning I have to somehow get the year out of it :(
    – hmmmbob
    Nov 11 at 5:23










  • Is possible create some sample data with expected output?
    – jezrael
    Nov 11 at 5:27










  • I hope i did that, i just want to for example for each PERMNO do the yearly avg of volume, so i need to access the DATE index, but I do not know how.
    – hmmmbob
    Nov 11 at 5:43










  • Do you think df['avg'] = df.groupby([df.index.get_level_values(0).year, 'PERMNO'])['Volume'].transform('mean') ?
    – jezrael
    Nov 11 at 5:44












  • It does not throw an error so I hope it worked. I am just puzzled how you came up with index.get_level_values(0).year ..Can you tell me how you found that, so I can help myself in the future?
    – hmmmbob
    Nov 11 at 5:54













up vote
1
down vote

favorite









up vote
1
down vote

favorite











I have a data frame with 2 indexes called "DATE"( it is monthly data) and "ID" and a column variable named Volume. Now I want to iterate over it and fill for every unique ID a new column with the average value of the column Volume in a new column.



The basic idea is to figure out which months are above the yearly avg for every ID.



list(df.index)

(Timestamp('1970-09-30 00:00:00'), 12167.0)

print(df.index.name)

None


I seemed to not find a tutorial to address this :(



Can someone please point me in the right direction



                    SHRCD  EXCHCD   SICCD     PRC     VOL       RET    SHROUT  
DATE PERMNO
1970-08-31 10559.0 10.0 1.0 5311.0 35.000 1692.0 0.030657 12048.0
12626.0 10.0 1.0 5411.0 46.250 926.0 0.088235 6624.0
12749.0 11.0 1.0 5331.0 45.500 5632.0 0.126173 34685.0
13100.0 11.0 1.0 5311.0 22.000 1759.0 0.171242 15107.0
13653.0 10.0 1.0 5311.0 13.125 141.0 0.220930 1337.0
13936.0 11.0 1.0 2331.0 11.500 270.0 -0.053061 3942.0
14322.0 11.0 1.0 5311.0 64.750 6934.0 0.024409 154187.0
16969.0 10.0 1.0 5311.0 42.875 1069.0 0.186851 13828.0
17072.0 10.0 1.0 5311.0 14.750 777.0 0.026087 5415.0
17304.0 10.0 1.0 5311.0 24.875 1939.0 0.058511 8150.0









share|improve this question















I have a data frame with 2 indexes called "DATE"( it is monthly data) and "ID" and a column variable named Volume. Now I want to iterate over it and fill for every unique ID a new column with the average value of the column Volume in a new column.



The basic idea is to figure out which months are above the yearly avg for every ID.



list(df.index)

(Timestamp('1970-09-30 00:00:00'), 12167.0)

print(df.index.name)

None


I seemed to not find a tutorial to address this :(



Can someone please point me in the right direction



                    SHRCD  EXCHCD   SICCD     PRC     VOL       RET    SHROUT  
DATE PERMNO
1970-08-31 10559.0 10.0 1.0 5311.0 35.000 1692.0 0.030657 12048.0
12626.0 10.0 1.0 5411.0 46.250 926.0 0.088235 6624.0
12749.0 11.0 1.0 5331.0 45.500 5632.0 0.126173 34685.0
13100.0 11.0 1.0 5311.0 22.000 1759.0 0.171242 15107.0
13653.0 10.0 1.0 5311.0 13.125 141.0 0.220930 1337.0
13936.0 11.0 1.0 2331.0 11.500 270.0 -0.053061 3942.0
14322.0 11.0 1.0 5311.0 64.750 6934.0 0.024409 154187.0
16969.0 10.0 1.0 5311.0 42.875 1069.0 0.186851 13828.0
17072.0 10.0 1.0 5311.0 14.750 777.0 0.026087 5415.0
17304.0 10.0 1.0 5311.0 24.875 1939.0 0.058511 8150.0






pandas dataframe indexing






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 11 at 5:42

























asked Nov 10 at 19:28









hmmmbob

412921




412921












  • Thank you so much, the problem is that I have not only groupby ID but also by the year of the 'DATE' index. meaning I have to somehow get the year out of it :(
    – hmmmbob
    Nov 11 at 5:23










  • Is possible create some sample data with expected output?
    – jezrael
    Nov 11 at 5:27










  • I hope i did that, i just want to for example for each PERMNO do the yearly avg of volume, so i need to access the DATE index, but I do not know how.
    – hmmmbob
    Nov 11 at 5:43










  • Do you think df['avg'] = df.groupby([df.index.get_level_values(0).year, 'PERMNO'])['Volume'].transform('mean') ?
    – jezrael
    Nov 11 at 5:44












  • It does not throw an error so I hope it worked. I am just puzzled how you came up with index.get_level_values(0).year ..Can you tell me how you found that, so I can help myself in the future?
    – hmmmbob
    Nov 11 at 5:54


















  • Thank you so much, the problem is that I have not only groupby ID but also by the year of the 'DATE' index. meaning I have to somehow get the year out of it :(
    – hmmmbob
    Nov 11 at 5:23










  • Is possible create some sample data with expected output?
    – jezrael
    Nov 11 at 5:27










  • I hope i did that, i just want to for example for each PERMNO do the yearly avg of volume, so i need to access the DATE index, but I do not know how.
    – hmmmbob
    Nov 11 at 5:43










  • Do you think df['avg'] = df.groupby([df.index.get_level_values(0).year, 'PERMNO'])['Volume'].transform('mean') ?
    – jezrael
    Nov 11 at 5:44












  • It does not throw an error so I hope it worked. I am just puzzled how you came up with index.get_level_values(0).year ..Can you tell me how you found that, so I can help myself in the future?
    – hmmmbob
    Nov 11 at 5:54
















Thank you so much, the problem is that I have not only groupby ID but also by the year of the 'DATE' index. meaning I have to somehow get the year out of it :(
– hmmmbob
Nov 11 at 5:23




Thank you so much, the problem is that I have not only groupby ID but also by the year of the 'DATE' index. meaning I have to somehow get the year out of it :(
– hmmmbob
Nov 11 at 5:23












Is possible create some sample data with expected output?
– jezrael
Nov 11 at 5:27




Is possible create some sample data with expected output?
– jezrael
Nov 11 at 5:27












I hope i did that, i just want to for example for each PERMNO do the yearly avg of volume, so i need to access the DATE index, but I do not know how.
– hmmmbob
Nov 11 at 5:43




I hope i did that, i just want to for example for each PERMNO do the yearly avg of volume, so i need to access the DATE index, but I do not know how.
– hmmmbob
Nov 11 at 5:43












Do you think df['avg'] = df.groupby([df.index.get_level_values(0).year, 'PERMNO'])['Volume'].transform('mean') ?
– jezrael
Nov 11 at 5:44






Do you think df['avg'] = df.groupby([df.index.get_level_values(0).year, 'PERMNO'])['Volume'].transform('mean') ?
– jezrael
Nov 11 at 5:44














It does not throw an error so I hope it worked. I am just puzzled how you came up with index.get_level_values(0).year ..Can you tell me how you found that, so I can help myself in the future?
– hmmmbob
Nov 11 at 5:54




It does not throw an error so I hope it worked. I am just puzzled how you came up with index.get_level_values(0).year ..Can you tell me how you found that, so I can help myself in the future?
– hmmmbob
Nov 11 at 5:54












1 Answer
1






active

oldest

votes

















up vote
1
down vote



accepted










You can use transform with year for same size Series like original DataFrame:



print (df)
VOL
DATE PERMNO
1970-08-31 10559.0 1
10559.0 2
12749.0 3
1971-08-31 13100.0 4
13100.0 5

df['avg'] = df.groupby([df.index.get_level_values(0).year, 'PERMNO'])['VOL'].transform('mean')
print (df)
VOL avg
DATE PERMNO
1970-08-31 10559.0 1 1.5
10559.0 2 1.5
12749.0 3 3.0
1971-08-31 13100.0 4 4.5
13100.0 5 4.5





share|improve this answer





















    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














     

    draft saved


    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53242644%2fpandas-data-frame-iterating-over-2-index-variables%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes








    up vote
    1
    down vote



    accepted










    You can use transform with year for same size Series like original DataFrame:



    print (df)
    VOL
    DATE PERMNO
    1970-08-31 10559.0 1
    10559.0 2
    12749.0 3
    1971-08-31 13100.0 4
    13100.0 5

    df['avg'] = df.groupby([df.index.get_level_values(0).year, 'PERMNO'])['VOL'].transform('mean')
    print (df)
    VOL avg
    DATE PERMNO
    1970-08-31 10559.0 1 1.5
    10559.0 2 1.5
    12749.0 3 3.0
    1971-08-31 13100.0 4 4.5
    13100.0 5 4.5





    share|improve this answer

























      up vote
      1
      down vote



      accepted










      You can use transform with year for same size Series like original DataFrame:



      print (df)
      VOL
      DATE PERMNO
      1970-08-31 10559.0 1
      10559.0 2
      12749.0 3
      1971-08-31 13100.0 4
      13100.0 5

      df['avg'] = df.groupby([df.index.get_level_values(0).year, 'PERMNO'])['VOL'].transform('mean')
      print (df)
      VOL avg
      DATE PERMNO
      1970-08-31 10559.0 1 1.5
      10559.0 2 1.5
      12749.0 3 3.0
      1971-08-31 13100.0 4 4.5
      13100.0 5 4.5





      share|improve this answer























        up vote
        1
        down vote



        accepted







        up vote
        1
        down vote



        accepted






        You can use transform with year for same size Series like original DataFrame:



        print (df)
        VOL
        DATE PERMNO
        1970-08-31 10559.0 1
        10559.0 2
        12749.0 3
        1971-08-31 13100.0 4
        13100.0 5

        df['avg'] = df.groupby([df.index.get_level_values(0).year, 'PERMNO'])['VOL'].transform('mean')
        print (df)
        VOL avg
        DATE PERMNO
        1970-08-31 10559.0 1 1.5
        10559.0 2 1.5
        12749.0 3 3.0
        1971-08-31 13100.0 4 4.5
        13100.0 5 4.5





        share|improve this answer












        You can use transform with year for same size Series like original DataFrame:



        print (df)
        VOL
        DATE PERMNO
        1970-08-31 10559.0 1
        10559.0 2
        12749.0 3
        1971-08-31 13100.0 4
        13100.0 5

        df['avg'] = df.groupby([df.index.get_level_values(0).year, 'PERMNO'])['VOL'].transform('mean')
        print (df)
        VOL avg
        DATE PERMNO
        1970-08-31 10559.0 1 1.5
        10559.0 2 1.5
        12749.0 3 3.0
        1971-08-31 13100.0 4 4.5
        13100.0 5 4.5






        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Nov 11 at 6:10









        jezrael

        307k20243317




        307k20243317






























             

            draft saved


            draft discarded



















































             


            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53242644%2fpandas-data-frame-iterating-over-2-index-variables%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Florida Star v. B. J. F.

            Danny Elfman

            Retrieve a Users Dashboard in Tumblr with R and TumblR. Oauth Issues