group data using pandas, but how do I keep the order of the group and do math on two of the columns rows?





.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ height:90px;width:728px;box-sizing:border-box;
}







0















df:



    Time Name  X  Y
0 00 AA 0 0
1 30 BB 1 1
2 45 CC 2 2
3 60 GG:AB 3 3
4 90 GG:AC 4 4
5 120 AA 5 3


dataGroup = df.groupby



([pd.Grouper(key=Time,freq='30s'),'Name'])).sort_values(by=['Timestamp'],ascending=True)


I have tried doing a diff() on the row, but it is returning NaN or something not expected.



df.groupby('Name', sort=False)['X'].diff()


How do I keep the groupings and the time sort, and do diff between a row and its previous row (for both the X and the Y column)



Expected output:
XDiff would be Group AA,
XDiff row 1 = (X row1 - origin (known))
XDiff row 2 = (X row2 - X row1)



    Time Name  X  Y XDiff  YDiff
0 00 AA 0 0 0 0
5 120 AA 5 3 5 3
1 30 BB 1 1 0 0
6 55 BB 2 3 1 2
2 45 CC 2 2 0 0
3 60 GG:AB 3 3 0 0
4 90 GG:AC 4 4 0 0


It would be nice to see the total distance for each group (ie, AA is 5, BB is 1)
In my example, I only have a couple of rows for each group, but what if there were 100 of them, the diff would give me values for the distance between any two, but not the total distance for that group.










share|improve this question

























  • Can you post the expected output?

    – harvpan
    Nov 16 '18 at 16:07











  • related / possible duplicate: stackoverflow.com/questions/20648346/…

    – Evan
    Nov 16 '18 at 16:41











  • Can you clarify what you mean by "total distance"?

    – Evan
    Nov 16 '18 at 16:53











  • Possible duplicate of Computing diffs within groups of a dataframe

    – Evan
    Nov 16 '18 at 16:53


















0















df:



    Time Name  X  Y
0 00 AA 0 0
1 30 BB 1 1
2 45 CC 2 2
3 60 GG:AB 3 3
4 90 GG:AC 4 4
5 120 AA 5 3


dataGroup = df.groupby



([pd.Grouper(key=Time,freq='30s'),'Name'])).sort_values(by=['Timestamp'],ascending=True)


I have tried doing a diff() on the row, but it is returning NaN or something not expected.



df.groupby('Name', sort=False)['X'].diff()


How do I keep the groupings and the time sort, and do diff between a row and its previous row (for both the X and the Y column)



Expected output:
XDiff would be Group AA,
XDiff row 1 = (X row1 - origin (known))
XDiff row 2 = (X row2 - X row1)



    Time Name  X  Y XDiff  YDiff
0 00 AA 0 0 0 0
5 120 AA 5 3 5 3
1 30 BB 1 1 0 0
6 55 BB 2 3 1 2
2 45 CC 2 2 0 0
3 60 GG:AB 3 3 0 0
4 90 GG:AC 4 4 0 0


It would be nice to see the total distance for each group (ie, AA is 5, BB is 1)
In my example, I only have a couple of rows for each group, but what if there were 100 of them, the diff would give me values for the distance between any two, but not the total distance for that group.










share|improve this question

























  • Can you post the expected output?

    – harvpan
    Nov 16 '18 at 16:07











  • related / possible duplicate: stackoverflow.com/questions/20648346/…

    – Evan
    Nov 16 '18 at 16:41











  • Can you clarify what you mean by "total distance"?

    – Evan
    Nov 16 '18 at 16:53











  • Possible duplicate of Computing diffs within groups of a dataframe

    – Evan
    Nov 16 '18 at 16:53














0












0








0








df:



    Time Name  X  Y
0 00 AA 0 0
1 30 BB 1 1
2 45 CC 2 2
3 60 GG:AB 3 3
4 90 GG:AC 4 4
5 120 AA 5 3


dataGroup = df.groupby



([pd.Grouper(key=Time,freq='30s'),'Name'])).sort_values(by=['Timestamp'],ascending=True)


I have tried doing a diff() on the row, but it is returning NaN or something not expected.



df.groupby('Name', sort=False)['X'].diff()


How do I keep the groupings and the time sort, and do diff between a row and its previous row (for both the X and the Y column)



Expected output:
XDiff would be Group AA,
XDiff row 1 = (X row1 - origin (known))
XDiff row 2 = (X row2 - X row1)



    Time Name  X  Y XDiff  YDiff
0 00 AA 0 0 0 0
5 120 AA 5 3 5 3
1 30 BB 1 1 0 0
6 55 BB 2 3 1 2
2 45 CC 2 2 0 0
3 60 GG:AB 3 3 0 0
4 90 GG:AC 4 4 0 0


It would be nice to see the total distance for each group (ie, AA is 5, BB is 1)
In my example, I only have a couple of rows for each group, but what if there were 100 of them, the diff would give me values for the distance between any two, but not the total distance for that group.










share|improve this question
















df:



    Time Name  X  Y
0 00 AA 0 0
1 30 BB 1 1
2 45 CC 2 2
3 60 GG:AB 3 3
4 90 GG:AC 4 4
5 120 AA 5 3


dataGroup = df.groupby



([pd.Grouper(key=Time,freq='30s'),'Name'])).sort_values(by=['Timestamp'],ascending=True)


I have tried doing a diff() on the row, but it is returning NaN or something not expected.



df.groupby('Name', sort=False)['X'].diff()


How do I keep the groupings and the time sort, and do diff between a row and its previous row (for both the X and the Y column)



Expected output:
XDiff would be Group AA,
XDiff row 1 = (X row1 - origin (known))
XDiff row 2 = (X row2 - X row1)



    Time Name  X  Y XDiff  YDiff
0 00 AA 0 0 0 0
5 120 AA 5 3 5 3
1 30 BB 1 1 0 0
6 55 BB 2 3 1 2
2 45 CC 2 2 0 0
3 60 GG:AB 3 3 0 0
4 90 GG:AC 4 4 0 0


It would be nice to see the total distance for each group (ie, AA is 5, BB is 1)
In my example, I only have a couple of rows for each group, but what if there were 100 of them, the diff would give me values for the distance between any two, but not the total distance for that group.







pandas dataframe pandas-groupby






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 16 '18 at 22:13







wegunterjr

















asked Nov 16 '18 at 15:11









wegunterjrwegunterjr

718




718













  • Can you post the expected output?

    – harvpan
    Nov 16 '18 at 16:07











  • related / possible duplicate: stackoverflow.com/questions/20648346/…

    – Evan
    Nov 16 '18 at 16:41











  • Can you clarify what you mean by "total distance"?

    – Evan
    Nov 16 '18 at 16:53











  • Possible duplicate of Computing diffs within groups of a dataframe

    – Evan
    Nov 16 '18 at 16:53



















  • Can you post the expected output?

    – harvpan
    Nov 16 '18 at 16:07











  • related / possible duplicate: stackoverflow.com/questions/20648346/…

    – Evan
    Nov 16 '18 at 16:41











  • Can you clarify what you mean by "total distance"?

    – Evan
    Nov 16 '18 at 16:53











  • Possible duplicate of Computing diffs within groups of a dataframe

    – Evan
    Nov 16 '18 at 16:53

















Can you post the expected output?

– harvpan
Nov 16 '18 at 16:07





Can you post the expected output?

– harvpan
Nov 16 '18 at 16:07













related / possible duplicate: stackoverflow.com/questions/20648346/…

– Evan
Nov 16 '18 at 16:41





related / possible duplicate: stackoverflow.com/questions/20648346/…

– Evan
Nov 16 '18 at 16:41













Can you clarify what you mean by "total distance"?

– Evan
Nov 16 '18 at 16:53





Can you clarify what you mean by "total distance"?

– Evan
Nov 16 '18 at 16:53













Possible duplicate of Computing diffs within groups of a dataframe

– Evan
Nov 16 '18 at 16:53





Possible duplicate of Computing diffs within groups of a dataframe

– Evan
Nov 16 '18 at 16:53












1 Answer
1






active

oldest

votes


















0














Ripping off https://stackoverflow.com/a/20664760/6672746, you can use a lambda function to calculate the difference between rows for X and Y. I also included two lines to set the index (after the groupby) and sort it.



df['x_diff'] = df.groupby(['Name'])['X'].transform(lambda x: x.diff()).fillna(0)
df['y_diff'] = df.groupby(['Name'])['Y'].transform(lambda x: x.diff()).fillna(0)
df.set_index(["Name", "Time"], inplace=True)
df.sort_index(level=["Name", "Time"], inplace=True)


Output:



            X  Y  x_diff  y_diff
Name Time
AA 0 0 0 0.0 0.0
120 5 3 5.0 3.0
BB 30 1 1 0.0 0.0
CC 45 2 2 0.0 0.0
GG:AB 60 3 3 0.0 0.0
GG:AC 90 4 4 0.0 0.0





share|improve this answer


























    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53340527%2fgroup-data-using-pandas-but-how-do-i-keep-the-order-of-the-group-and-do-math-on%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    0














    Ripping off https://stackoverflow.com/a/20664760/6672746, you can use a lambda function to calculate the difference between rows for X and Y. I also included two lines to set the index (after the groupby) and sort it.



    df['x_diff'] = df.groupby(['Name'])['X'].transform(lambda x: x.diff()).fillna(0)
    df['y_diff'] = df.groupby(['Name'])['Y'].transform(lambda x: x.diff()).fillna(0)
    df.set_index(["Name", "Time"], inplace=True)
    df.sort_index(level=["Name", "Time"], inplace=True)


    Output:



                X  Y  x_diff  y_diff
    Name Time
    AA 0 0 0 0.0 0.0
    120 5 3 5.0 3.0
    BB 30 1 1 0.0 0.0
    CC 45 2 2 0.0 0.0
    GG:AB 60 3 3 0.0 0.0
    GG:AC 90 4 4 0.0 0.0





    share|improve this answer






























      0














      Ripping off https://stackoverflow.com/a/20664760/6672746, you can use a lambda function to calculate the difference between rows for X and Y. I also included two lines to set the index (after the groupby) and sort it.



      df['x_diff'] = df.groupby(['Name'])['X'].transform(lambda x: x.diff()).fillna(0)
      df['y_diff'] = df.groupby(['Name'])['Y'].transform(lambda x: x.diff()).fillna(0)
      df.set_index(["Name", "Time"], inplace=True)
      df.sort_index(level=["Name", "Time"], inplace=True)


      Output:



                  X  Y  x_diff  y_diff
      Name Time
      AA 0 0 0 0.0 0.0
      120 5 3 5.0 3.0
      BB 30 1 1 0.0 0.0
      CC 45 2 2 0.0 0.0
      GG:AB 60 3 3 0.0 0.0
      GG:AC 90 4 4 0.0 0.0





      share|improve this answer




























        0












        0








        0







        Ripping off https://stackoverflow.com/a/20664760/6672746, you can use a lambda function to calculate the difference between rows for X and Y. I also included two lines to set the index (after the groupby) and sort it.



        df['x_diff'] = df.groupby(['Name'])['X'].transform(lambda x: x.diff()).fillna(0)
        df['y_diff'] = df.groupby(['Name'])['Y'].transform(lambda x: x.diff()).fillna(0)
        df.set_index(["Name", "Time"], inplace=True)
        df.sort_index(level=["Name", "Time"], inplace=True)


        Output:



                    X  Y  x_diff  y_diff
        Name Time
        AA 0 0 0 0.0 0.0
        120 5 3 5.0 3.0
        BB 30 1 1 0.0 0.0
        CC 45 2 2 0.0 0.0
        GG:AB 60 3 3 0.0 0.0
        GG:AC 90 4 4 0.0 0.0





        share|improve this answer















        Ripping off https://stackoverflow.com/a/20664760/6672746, you can use a lambda function to calculate the difference between rows for X and Y. I also included two lines to set the index (after the groupby) and sort it.



        df['x_diff'] = df.groupby(['Name'])['X'].transform(lambda x: x.diff()).fillna(0)
        df['y_diff'] = df.groupby(['Name'])['Y'].transform(lambda x: x.diff()).fillna(0)
        df.set_index(["Name", "Time"], inplace=True)
        df.sort_index(level=["Name", "Time"], inplace=True)


        Output:



                    X  Y  x_diff  y_diff
        Name Time
        AA 0 0 0 0.0 0.0
        120 5 3 5.0 3.0
        BB 30 1 1 0.0 0.0
        CC 45 2 2 0.0 0.0
        GG:AB 60 3 3 0.0 0.0
        GG:AC 90 4 4 0.0 0.0






        share|improve this answer














        share|improve this answer



        share|improve this answer








        edited Nov 16 '18 at 16:49

























        answered Nov 16 '18 at 16:43









        EvanEvan

        1,161516




        1,161516
































            draft saved

            draft discarded




















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53340527%2fgroup-data-using-pandas-but-how-do-i-keep-the-order-of-the-group-and-do-math-on%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Florida Star v. B. J. F.

            Danny Elfman

            Retrieve a Users Dashboard in Tumblr with R and TumblR. Oauth Issues