Frequency of repetitive position in pandas data frame












1














Hi I am working to find out repetitive position of the following data frame:



data = pd.DataFrame()
data ['league'] =['A','A','A','A','A','A','B','B','B']
data ['Team'] = ['X','X','X','Y','Y','Y','Z','Z','Z']
data ['week'] =[1,2,3,1,2,3,1,2,3]
data ['position']= [1,1,2,2,2,1,2,3,4]


I will compare the data for position from previous row, it is it the same, I will assign one. If it is different previous row, I will assign as 1



My expected outcome will be as follow:



enter image description here



It means I will group by (League, Team and week) and work out the frequency.
Can anyone advise how to do that in Pandas



Thanks,



Zep










share|improve this question





























    1














    Hi I am working to find out repetitive position of the following data frame:



    data = pd.DataFrame()
    data ['league'] =['A','A','A','A','A','A','B','B','B']
    data ['Team'] = ['X','X','X','Y','Y','Y','Z','Z','Z']
    data ['week'] =[1,2,3,1,2,3,1,2,3]
    data ['position']= [1,1,2,2,2,1,2,3,4]


    I will compare the data for position from previous row, it is it the same, I will assign one. If it is different previous row, I will assign as 1



    My expected outcome will be as follow:



    enter image description here



    It means I will group by (League, Team and week) and work out the frequency.
    Can anyone advise how to do that in Pandas



    Thanks,



    Zep










    share|improve this question



























      1












      1








      1


      1





      Hi I am working to find out repetitive position of the following data frame:



      data = pd.DataFrame()
      data ['league'] =['A','A','A','A','A','A','B','B','B']
      data ['Team'] = ['X','X','X','Y','Y','Y','Z','Z','Z']
      data ['week'] =[1,2,3,1,2,3,1,2,3]
      data ['position']= [1,1,2,2,2,1,2,3,4]


      I will compare the data for position from previous row, it is it the same, I will assign one. If it is different previous row, I will assign as 1



      My expected outcome will be as follow:



      enter image description here



      It means I will group by (League, Team and week) and work out the frequency.
      Can anyone advise how to do that in Pandas



      Thanks,



      Zep










      share|improve this question















      Hi I am working to find out repetitive position of the following data frame:



      data = pd.DataFrame()
      data ['league'] =['A','A','A','A','A','A','B','B','B']
      data ['Team'] = ['X','X','X','Y','Y','Y','Z','Z','Z']
      data ['week'] =[1,2,3,1,2,3,1,2,3]
      data ['position']= [1,1,2,2,2,1,2,3,4]


      I will compare the data for position from previous row, it is it the same, I will assign one. If it is different previous row, I will assign as 1



      My expected outcome will be as follow:



      enter image description here



      It means I will group by (League, Team and week) and work out the frequency.
      Can anyone advise how to do that in Pandas



      Thanks,



      Zep







      python pandas






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Nov 12 at 9:19

























      asked Nov 12 at 9:00









      Zephyr

      42810




      42810
























          2 Answers
          2






          active

          oldest

          votes


















          1














          Use diff, and compare against 0:



          v = df.position.diff()
          v[0] = 0
          df['frequency'] = v.ne(0).astype(int)

          print(df)
          league Team week position frequency
          0 A X 1 1 0
          1 A X 2 1 0
          2 A X 3 2 1
          3 A Y 1 2 0
          4 A Y 2 2 0
          5 A Y 3 1 1
          6 B Z 1 2 1
          7 B Z 2 3 1
          8 B Z 3 4 1




          For performance reasons, you should try to avoid a fillna call.



          df = pd.concat([df] * 100000, ignore_index=True)

          %timeit df['frequency'] = df['position'].diff().abs().fillna(0,downcast='infer')
          %%timeit
          v = df.position.diff()
          v[0] = 0
          df['frequency'] = v.ne(0).astype(int)

          83.7 ms ± 1.55 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
          10.9 ms ± 217 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)




          To extend this answer to work in a groupby, use



          v = df.groupby(['league', 'Team', 'week']).position.diff()
          v[np.isnan(v)] = 0

          df['frequency'] = v.ne(0).astype(int)





          share|improve this answer





















          • @ coldspeed, what if simple data['Freq'] = data.position.diff().fillna("0")
            – pygo
            Nov 12 at 9:27










          • @pygo Simple but slower, hence avoided.
            – coldspeed
            Nov 12 at 9:28










          • Thanks coldspeed. How about if position in week 1 has to be zero as it doesn’t hv any previous value if we group by. I mean I am tracking position changes from week one(this is just start of tracking)
            – Zephyr
            Nov 12 at 9:29










          • @Zephyr week 1 is zero by default for all groups (as per my understanding of your problem).
            – coldspeed
            Nov 12 at 9:34










          • @coldspeed, what is v[0] = 0 as assiging it zero
            – pygo
            Nov 12 at 9:35



















          1














          Use diff and abs with fillna:



          data['frequency'] = data['position'].diff().abs().fillna(0,downcast='infer')

          print(data)
          league Team week position frequency
          0 A X 1 1 0
          1 A X 2 1 0
          2 A X 3 2 1
          3 A Y 1 2 0
          4 A Y 2 2 0
          5 A Y 3 1 1
          6 B Z 1 2 1
          7 B Z 2 3 1
          8 B Z 3 4 1




          Using groupby gives all zeros, since you are comparing within groups not on whole dataframe.



          data.groupby(['league', 'Team', 'week'])['position'].diff().fillna(0,downcast='infer')

          0 0
          1 0
          2 0
          3 0
          4 0
          5 0
          6 0
          7 0
          8 0
          Name: position, dtype: int64





          share|improve this answer























          • Thanks Sandeep. If I want to groupby with League,team, week then work out the frequency, how would I add that. The sample data frame is already sorted but actual data is random.
            – Zephyr
            Nov 12 at 9:21











          Your Answer






          StackExchange.ifUsing("editor", function () {
          StackExchange.using("externalEditor", function () {
          StackExchange.using("snippets", function () {
          StackExchange.snippets.init();
          });
          });
          }, "code-snippets");

          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "1"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });














          draft saved

          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53258755%2ffrequency-of-repetitive-position-in-pandas-data-frame%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          2 Answers
          2






          active

          oldest

          votes








          2 Answers
          2






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          1














          Use diff, and compare against 0:



          v = df.position.diff()
          v[0] = 0
          df['frequency'] = v.ne(0).astype(int)

          print(df)
          league Team week position frequency
          0 A X 1 1 0
          1 A X 2 1 0
          2 A X 3 2 1
          3 A Y 1 2 0
          4 A Y 2 2 0
          5 A Y 3 1 1
          6 B Z 1 2 1
          7 B Z 2 3 1
          8 B Z 3 4 1




          For performance reasons, you should try to avoid a fillna call.



          df = pd.concat([df] * 100000, ignore_index=True)

          %timeit df['frequency'] = df['position'].diff().abs().fillna(0,downcast='infer')
          %%timeit
          v = df.position.diff()
          v[0] = 0
          df['frequency'] = v.ne(0).astype(int)

          83.7 ms ± 1.55 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
          10.9 ms ± 217 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)




          To extend this answer to work in a groupby, use



          v = df.groupby(['league', 'Team', 'week']).position.diff()
          v[np.isnan(v)] = 0

          df['frequency'] = v.ne(0).astype(int)





          share|improve this answer





















          • @ coldspeed, what if simple data['Freq'] = data.position.diff().fillna("0")
            – pygo
            Nov 12 at 9:27










          • @pygo Simple but slower, hence avoided.
            – coldspeed
            Nov 12 at 9:28










          • Thanks coldspeed. How about if position in week 1 has to be zero as it doesn’t hv any previous value if we group by. I mean I am tracking position changes from week one(this is just start of tracking)
            – Zephyr
            Nov 12 at 9:29










          • @Zephyr week 1 is zero by default for all groups (as per my understanding of your problem).
            – coldspeed
            Nov 12 at 9:34










          • @coldspeed, what is v[0] = 0 as assiging it zero
            – pygo
            Nov 12 at 9:35
















          1














          Use diff, and compare against 0:



          v = df.position.diff()
          v[0] = 0
          df['frequency'] = v.ne(0).astype(int)

          print(df)
          league Team week position frequency
          0 A X 1 1 0
          1 A X 2 1 0
          2 A X 3 2 1
          3 A Y 1 2 0
          4 A Y 2 2 0
          5 A Y 3 1 1
          6 B Z 1 2 1
          7 B Z 2 3 1
          8 B Z 3 4 1




          For performance reasons, you should try to avoid a fillna call.



          df = pd.concat([df] * 100000, ignore_index=True)

          %timeit df['frequency'] = df['position'].diff().abs().fillna(0,downcast='infer')
          %%timeit
          v = df.position.diff()
          v[0] = 0
          df['frequency'] = v.ne(0).astype(int)

          83.7 ms ± 1.55 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
          10.9 ms ± 217 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)




          To extend this answer to work in a groupby, use



          v = df.groupby(['league', 'Team', 'week']).position.diff()
          v[np.isnan(v)] = 0

          df['frequency'] = v.ne(0).astype(int)





          share|improve this answer





















          • @ coldspeed, what if simple data['Freq'] = data.position.diff().fillna("0")
            – pygo
            Nov 12 at 9:27










          • @pygo Simple but slower, hence avoided.
            – coldspeed
            Nov 12 at 9:28










          • Thanks coldspeed. How about if position in week 1 has to be zero as it doesn’t hv any previous value if we group by. I mean I am tracking position changes from week one(this is just start of tracking)
            – Zephyr
            Nov 12 at 9:29










          • @Zephyr week 1 is zero by default for all groups (as per my understanding of your problem).
            – coldspeed
            Nov 12 at 9:34










          • @coldspeed, what is v[0] = 0 as assiging it zero
            – pygo
            Nov 12 at 9:35














          1












          1








          1






          Use diff, and compare against 0:



          v = df.position.diff()
          v[0] = 0
          df['frequency'] = v.ne(0).astype(int)

          print(df)
          league Team week position frequency
          0 A X 1 1 0
          1 A X 2 1 0
          2 A X 3 2 1
          3 A Y 1 2 0
          4 A Y 2 2 0
          5 A Y 3 1 1
          6 B Z 1 2 1
          7 B Z 2 3 1
          8 B Z 3 4 1




          For performance reasons, you should try to avoid a fillna call.



          df = pd.concat([df] * 100000, ignore_index=True)

          %timeit df['frequency'] = df['position'].diff().abs().fillna(0,downcast='infer')
          %%timeit
          v = df.position.diff()
          v[0] = 0
          df['frequency'] = v.ne(0).astype(int)

          83.7 ms ± 1.55 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
          10.9 ms ± 217 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)




          To extend this answer to work in a groupby, use



          v = df.groupby(['league', 'Team', 'week']).position.diff()
          v[np.isnan(v)] = 0

          df['frequency'] = v.ne(0).astype(int)





          share|improve this answer












          Use diff, and compare against 0:



          v = df.position.diff()
          v[0] = 0
          df['frequency'] = v.ne(0).astype(int)

          print(df)
          league Team week position frequency
          0 A X 1 1 0
          1 A X 2 1 0
          2 A X 3 2 1
          3 A Y 1 2 0
          4 A Y 2 2 0
          5 A Y 3 1 1
          6 B Z 1 2 1
          7 B Z 2 3 1
          8 B Z 3 4 1




          For performance reasons, you should try to avoid a fillna call.



          df = pd.concat([df] * 100000, ignore_index=True)

          %timeit df['frequency'] = df['position'].diff().abs().fillna(0,downcast='infer')
          %%timeit
          v = df.position.diff()
          v[0] = 0
          df['frequency'] = v.ne(0).astype(int)

          83.7 ms ± 1.55 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
          10.9 ms ± 217 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)




          To extend this answer to work in a groupby, use



          v = df.groupby(['league', 'Team', 'week']).position.diff()
          v[np.isnan(v)] = 0

          df['frequency'] = v.ne(0).astype(int)






          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Nov 12 at 9:20









          coldspeed

          119k18112191




          119k18112191












          • @ coldspeed, what if simple data['Freq'] = data.position.diff().fillna("0")
            – pygo
            Nov 12 at 9:27










          • @pygo Simple but slower, hence avoided.
            – coldspeed
            Nov 12 at 9:28










          • Thanks coldspeed. How about if position in week 1 has to be zero as it doesn’t hv any previous value if we group by. I mean I am tracking position changes from week one(this is just start of tracking)
            – Zephyr
            Nov 12 at 9:29










          • @Zephyr week 1 is zero by default for all groups (as per my understanding of your problem).
            – coldspeed
            Nov 12 at 9:34










          • @coldspeed, what is v[0] = 0 as assiging it zero
            – pygo
            Nov 12 at 9:35


















          • @ coldspeed, what if simple data['Freq'] = data.position.diff().fillna("0")
            – pygo
            Nov 12 at 9:27










          • @pygo Simple but slower, hence avoided.
            – coldspeed
            Nov 12 at 9:28










          • Thanks coldspeed. How about if position in week 1 has to be zero as it doesn’t hv any previous value if we group by. I mean I am tracking position changes from week one(this is just start of tracking)
            – Zephyr
            Nov 12 at 9:29










          • @Zephyr week 1 is zero by default for all groups (as per my understanding of your problem).
            – coldspeed
            Nov 12 at 9:34










          • @coldspeed, what is v[0] = 0 as assiging it zero
            – pygo
            Nov 12 at 9:35
















          @ coldspeed, what if simple data['Freq'] = data.position.diff().fillna("0")
          – pygo
          Nov 12 at 9:27




          @ coldspeed, what if simple data['Freq'] = data.position.diff().fillna("0")
          – pygo
          Nov 12 at 9:27












          @pygo Simple but slower, hence avoided.
          – coldspeed
          Nov 12 at 9:28




          @pygo Simple but slower, hence avoided.
          – coldspeed
          Nov 12 at 9:28












          Thanks coldspeed. How about if position in week 1 has to be zero as it doesn’t hv any previous value if we group by. I mean I am tracking position changes from week one(this is just start of tracking)
          – Zephyr
          Nov 12 at 9:29




          Thanks coldspeed. How about if position in week 1 has to be zero as it doesn’t hv any previous value if we group by. I mean I am tracking position changes from week one(this is just start of tracking)
          – Zephyr
          Nov 12 at 9:29












          @Zephyr week 1 is zero by default for all groups (as per my understanding of your problem).
          – coldspeed
          Nov 12 at 9:34




          @Zephyr week 1 is zero by default for all groups (as per my understanding of your problem).
          – coldspeed
          Nov 12 at 9:34












          @coldspeed, what is v[0] = 0 as assiging it zero
          – pygo
          Nov 12 at 9:35




          @coldspeed, what is v[0] = 0 as assiging it zero
          – pygo
          Nov 12 at 9:35













          1














          Use diff and abs with fillna:



          data['frequency'] = data['position'].diff().abs().fillna(0,downcast='infer')

          print(data)
          league Team week position frequency
          0 A X 1 1 0
          1 A X 2 1 0
          2 A X 3 2 1
          3 A Y 1 2 0
          4 A Y 2 2 0
          5 A Y 3 1 1
          6 B Z 1 2 1
          7 B Z 2 3 1
          8 B Z 3 4 1




          Using groupby gives all zeros, since you are comparing within groups not on whole dataframe.



          data.groupby(['league', 'Team', 'week'])['position'].diff().fillna(0,downcast='infer')

          0 0
          1 0
          2 0
          3 0
          4 0
          5 0
          6 0
          7 0
          8 0
          Name: position, dtype: int64





          share|improve this answer























          • Thanks Sandeep. If I want to groupby with League,team, week then work out the frequency, how would I add that. The sample data frame is already sorted but actual data is random.
            – Zephyr
            Nov 12 at 9:21
















          1














          Use diff and abs with fillna:



          data['frequency'] = data['position'].diff().abs().fillna(0,downcast='infer')

          print(data)
          league Team week position frequency
          0 A X 1 1 0
          1 A X 2 1 0
          2 A X 3 2 1
          3 A Y 1 2 0
          4 A Y 2 2 0
          5 A Y 3 1 1
          6 B Z 1 2 1
          7 B Z 2 3 1
          8 B Z 3 4 1




          Using groupby gives all zeros, since you are comparing within groups not on whole dataframe.



          data.groupby(['league', 'Team', 'week'])['position'].diff().fillna(0,downcast='infer')

          0 0
          1 0
          2 0
          3 0
          4 0
          5 0
          6 0
          7 0
          8 0
          Name: position, dtype: int64





          share|improve this answer























          • Thanks Sandeep. If I want to groupby with League,team, week then work out the frequency, how would I add that. The sample data frame is already sorted but actual data is random.
            – Zephyr
            Nov 12 at 9:21














          1












          1








          1






          Use diff and abs with fillna:



          data['frequency'] = data['position'].diff().abs().fillna(0,downcast='infer')

          print(data)
          league Team week position frequency
          0 A X 1 1 0
          1 A X 2 1 0
          2 A X 3 2 1
          3 A Y 1 2 0
          4 A Y 2 2 0
          5 A Y 3 1 1
          6 B Z 1 2 1
          7 B Z 2 3 1
          8 B Z 3 4 1




          Using groupby gives all zeros, since you are comparing within groups not on whole dataframe.



          data.groupby(['league', 'Team', 'week'])['position'].diff().fillna(0,downcast='infer')

          0 0
          1 0
          2 0
          3 0
          4 0
          5 0
          6 0
          7 0
          8 0
          Name: position, dtype: int64





          share|improve this answer














          Use diff and abs with fillna:



          data['frequency'] = data['position'].diff().abs().fillna(0,downcast='infer')

          print(data)
          league Team week position frequency
          0 A X 1 1 0
          1 A X 2 1 0
          2 A X 3 2 1
          3 A Y 1 2 0
          4 A Y 2 2 0
          5 A Y 3 1 1
          6 B Z 1 2 1
          7 B Z 2 3 1
          8 B Z 3 4 1




          Using groupby gives all zeros, since you are comparing within groups not on whole dataframe.



          data.groupby(['league', 'Team', 'week'])['position'].diff().fillna(0,downcast='infer')

          0 0
          1 0
          2 0
          3 0
          4 0
          5 0
          6 0
          7 0
          8 0
          Name: position, dtype: int64






          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited Nov 12 at 9:28

























          answered Nov 12 at 9:03









          Sandeep Kadapa

          5,887428




          5,887428












          • Thanks Sandeep. If I want to groupby with League,team, week then work out the frequency, how would I add that. The sample data frame is already sorted but actual data is random.
            – Zephyr
            Nov 12 at 9:21


















          • Thanks Sandeep. If I want to groupby with League,team, week then work out the frequency, how would I add that. The sample data frame is already sorted but actual data is random.
            – Zephyr
            Nov 12 at 9:21
















          Thanks Sandeep. If I want to groupby with League,team, week then work out the frequency, how would I add that. The sample data frame is already sorted but actual data is random.
          – Zephyr
          Nov 12 at 9:21




          Thanks Sandeep. If I want to groupby with League,team, week then work out the frequency, how would I add that. The sample data frame is already sorted but actual data is random.
          – Zephyr
          Nov 12 at 9:21


















          draft saved

          draft discarded




















































          Thanks for contributing an answer to Stack Overflow!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          To learn more, see our tips on writing great answers.





          Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


          Please pay close attention to the following guidance:


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53258755%2ffrequency-of-repetitive-position-in-pandas-data-frame%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          Florida Star v. B. J. F.

          Danny Elfman

          Retrieve a Users Dashboard in Tumblr with R and TumblR. Oauth Issues