group data using pandas, but how do I keep the order of the group and do math on two of the columns rows?
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ height:90px;width:728px;box-sizing:border-box;
}
df:
Time Name X Y
0 00 AA 0 0
1 30 BB 1 1
2 45 CC 2 2
3 60 GG:AB 3 3
4 90 GG:AC 4 4
5 120 AA 5 3
dataGroup = df.groupby
([pd.Grouper(key=Time,freq='30s'),'Name'])).sort_values(by=['Timestamp'],ascending=True)
I have tried doing a diff() on the row, but it is returning NaN or something not expected.
df.groupby('Name', sort=False)['X'].diff()
How do I keep the groupings and the time sort, and do diff between a row and its previous row (for both the X and the Y column)
Expected output:
XDiff would be Group AA,
XDiff row 1 = (X row1 - origin (known))
XDiff row 2 = (X row2 - X row1)
Time Name X Y XDiff YDiff
0 00 AA 0 0 0 0
5 120 AA 5 3 5 3
1 30 BB 1 1 0 0
6 55 BB 2 3 1 2
2 45 CC 2 2 0 0
3 60 GG:AB 3 3 0 0
4 90 GG:AC 4 4 0 0
It would be nice to see the total distance for each group (ie, AA is 5, BB is 1)
In my example, I only have a couple of rows for each group, but what if there were 100 of them, the diff would give me values for the distance between any two, but not the total distance for that group.
pandas dataframe pandas-groupby
add a comment |
df:
Time Name X Y
0 00 AA 0 0
1 30 BB 1 1
2 45 CC 2 2
3 60 GG:AB 3 3
4 90 GG:AC 4 4
5 120 AA 5 3
dataGroup = df.groupby
([pd.Grouper(key=Time,freq='30s'),'Name'])).sort_values(by=['Timestamp'],ascending=True)
I have tried doing a diff() on the row, but it is returning NaN or something not expected.
df.groupby('Name', sort=False)['X'].diff()
How do I keep the groupings and the time sort, and do diff between a row and its previous row (for both the X and the Y column)
Expected output:
XDiff would be Group AA,
XDiff row 1 = (X row1 - origin (known))
XDiff row 2 = (X row2 - X row1)
Time Name X Y XDiff YDiff
0 00 AA 0 0 0 0
5 120 AA 5 3 5 3
1 30 BB 1 1 0 0
6 55 BB 2 3 1 2
2 45 CC 2 2 0 0
3 60 GG:AB 3 3 0 0
4 90 GG:AC 4 4 0 0
It would be nice to see the total distance for each group (ie, AA is 5, BB is 1)
In my example, I only have a couple of rows for each group, but what if there were 100 of them, the diff would give me values for the distance between any two, but not the total distance for that group.
pandas dataframe pandas-groupby
Can you post the expected output?
– harvpan
Nov 16 '18 at 16:07
related / possible duplicate: stackoverflow.com/questions/20648346/…
– Evan
Nov 16 '18 at 16:41
Can you clarify what you mean by "total distance"?
– Evan
Nov 16 '18 at 16:53
Possible duplicate of Computing diffs within groups of a dataframe
– Evan
Nov 16 '18 at 16:53
add a comment |
df:
Time Name X Y
0 00 AA 0 0
1 30 BB 1 1
2 45 CC 2 2
3 60 GG:AB 3 3
4 90 GG:AC 4 4
5 120 AA 5 3
dataGroup = df.groupby
([pd.Grouper(key=Time,freq='30s'),'Name'])).sort_values(by=['Timestamp'],ascending=True)
I have tried doing a diff() on the row, but it is returning NaN or something not expected.
df.groupby('Name', sort=False)['X'].diff()
How do I keep the groupings and the time sort, and do diff between a row and its previous row (for both the X and the Y column)
Expected output:
XDiff would be Group AA,
XDiff row 1 = (X row1 - origin (known))
XDiff row 2 = (X row2 - X row1)
Time Name X Y XDiff YDiff
0 00 AA 0 0 0 0
5 120 AA 5 3 5 3
1 30 BB 1 1 0 0
6 55 BB 2 3 1 2
2 45 CC 2 2 0 0
3 60 GG:AB 3 3 0 0
4 90 GG:AC 4 4 0 0
It would be nice to see the total distance for each group (ie, AA is 5, BB is 1)
In my example, I only have a couple of rows for each group, but what if there were 100 of them, the diff would give me values for the distance between any two, but not the total distance for that group.
pandas dataframe pandas-groupby
df:
Time Name X Y
0 00 AA 0 0
1 30 BB 1 1
2 45 CC 2 2
3 60 GG:AB 3 3
4 90 GG:AC 4 4
5 120 AA 5 3
dataGroup = df.groupby
([pd.Grouper(key=Time,freq='30s'),'Name'])).sort_values(by=['Timestamp'],ascending=True)
I have tried doing a diff() on the row, but it is returning NaN or something not expected.
df.groupby('Name', sort=False)['X'].diff()
How do I keep the groupings and the time sort, and do diff between a row and its previous row (for both the X and the Y column)
Expected output:
XDiff would be Group AA,
XDiff row 1 = (X row1 - origin (known))
XDiff row 2 = (X row2 - X row1)
Time Name X Y XDiff YDiff
0 00 AA 0 0 0 0
5 120 AA 5 3 5 3
1 30 BB 1 1 0 0
6 55 BB 2 3 1 2
2 45 CC 2 2 0 0
3 60 GG:AB 3 3 0 0
4 90 GG:AC 4 4 0 0
It would be nice to see the total distance for each group (ie, AA is 5, BB is 1)
In my example, I only have a couple of rows for each group, but what if there were 100 of them, the diff would give me values for the distance between any two, but not the total distance for that group.
pandas dataframe pandas-groupby
pandas dataframe pandas-groupby
edited Nov 16 '18 at 22:13
wegunterjr
asked Nov 16 '18 at 15:11
wegunterjrwegunterjr
718
718
Can you post the expected output?
– harvpan
Nov 16 '18 at 16:07
related / possible duplicate: stackoverflow.com/questions/20648346/…
– Evan
Nov 16 '18 at 16:41
Can you clarify what you mean by "total distance"?
– Evan
Nov 16 '18 at 16:53
Possible duplicate of Computing diffs within groups of a dataframe
– Evan
Nov 16 '18 at 16:53
add a comment |
Can you post the expected output?
– harvpan
Nov 16 '18 at 16:07
related / possible duplicate: stackoverflow.com/questions/20648346/…
– Evan
Nov 16 '18 at 16:41
Can you clarify what you mean by "total distance"?
– Evan
Nov 16 '18 at 16:53
Possible duplicate of Computing diffs within groups of a dataframe
– Evan
Nov 16 '18 at 16:53
Can you post the expected output?
– harvpan
Nov 16 '18 at 16:07
Can you post the expected output?
– harvpan
Nov 16 '18 at 16:07
related / possible duplicate: stackoverflow.com/questions/20648346/…
– Evan
Nov 16 '18 at 16:41
related / possible duplicate: stackoverflow.com/questions/20648346/…
– Evan
Nov 16 '18 at 16:41
Can you clarify what you mean by "total distance"?
– Evan
Nov 16 '18 at 16:53
Can you clarify what you mean by "total distance"?
– Evan
Nov 16 '18 at 16:53
Possible duplicate of Computing diffs within groups of a dataframe
– Evan
Nov 16 '18 at 16:53
Possible duplicate of Computing diffs within groups of a dataframe
– Evan
Nov 16 '18 at 16:53
add a comment |
1 Answer
1
active
oldest
votes
Ripping off https://stackoverflow.com/a/20664760/6672746, you can use a lambda
function to calculate the difference between rows for X
and Y
. I also included two lines to set the index (after the groupby
) and sort it.
df['x_diff'] = df.groupby(['Name'])['X'].transform(lambda x: x.diff()).fillna(0)
df['y_diff'] = df.groupby(['Name'])['Y'].transform(lambda x: x.diff()).fillna(0)
df.set_index(["Name", "Time"], inplace=True)
df.sort_index(level=["Name", "Time"], inplace=True)
Output:
X Y x_diff y_diff
Name Time
AA 0 0 0 0.0 0.0
120 5 3 5.0 3.0
BB 30 1 1 0.0 0.0
CC 45 2 2 0.0 0.0
GG:AB 60 3 3 0.0 0.0
GG:AC 90 4 4 0.0 0.0
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53340527%2fgroup-data-using-pandas-but-how-do-i-keep-the-order-of-the-group-and-do-math-on%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
Ripping off https://stackoverflow.com/a/20664760/6672746, you can use a lambda
function to calculate the difference between rows for X
and Y
. I also included two lines to set the index (after the groupby
) and sort it.
df['x_diff'] = df.groupby(['Name'])['X'].transform(lambda x: x.diff()).fillna(0)
df['y_diff'] = df.groupby(['Name'])['Y'].transform(lambda x: x.diff()).fillna(0)
df.set_index(["Name", "Time"], inplace=True)
df.sort_index(level=["Name", "Time"], inplace=True)
Output:
X Y x_diff y_diff
Name Time
AA 0 0 0 0.0 0.0
120 5 3 5.0 3.0
BB 30 1 1 0.0 0.0
CC 45 2 2 0.0 0.0
GG:AB 60 3 3 0.0 0.0
GG:AC 90 4 4 0.0 0.0
add a comment |
Ripping off https://stackoverflow.com/a/20664760/6672746, you can use a lambda
function to calculate the difference between rows for X
and Y
. I also included two lines to set the index (after the groupby
) and sort it.
df['x_diff'] = df.groupby(['Name'])['X'].transform(lambda x: x.diff()).fillna(0)
df['y_diff'] = df.groupby(['Name'])['Y'].transform(lambda x: x.diff()).fillna(0)
df.set_index(["Name", "Time"], inplace=True)
df.sort_index(level=["Name", "Time"], inplace=True)
Output:
X Y x_diff y_diff
Name Time
AA 0 0 0 0.0 0.0
120 5 3 5.0 3.0
BB 30 1 1 0.0 0.0
CC 45 2 2 0.0 0.0
GG:AB 60 3 3 0.0 0.0
GG:AC 90 4 4 0.0 0.0
add a comment |
Ripping off https://stackoverflow.com/a/20664760/6672746, you can use a lambda
function to calculate the difference between rows for X
and Y
. I also included two lines to set the index (after the groupby
) and sort it.
df['x_diff'] = df.groupby(['Name'])['X'].transform(lambda x: x.diff()).fillna(0)
df['y_diff'] = df.groupby(['Name'])['Y'].transform(lambda x: x.diff()).fillna(0)
df.set_index(["Name", "Time"], inplace=True)
df.sort_index(level=["Name", "Time"], inplace=True)
Output:
X Y x_diff y_diff
Name Time
AA 0 0 0 0.0 0.0
120 5 3 5.0 3.0
BB 30 1 1 0.0 0.0
CC 45 2 2 0.0 0.0
GG:AB 60 3 3 0.0 0.0
GG:AC 90 4 4 0.0 0.0
Ripping off https://stackoverflow.com/a/20664760/6672746, you can use a lambda
function to calculate the difference between rows for X
and Y
. I also included two lines to set the index (after the groupby
) and sort it.
df['x_diff'] = df.groupby(['Name'])['X'].transform(lambda x: x.diff()).fillna(0)
df['y_diff'] = df.groupby(['Name'])['Y'].transform(lambda x: x.diff()).fillna(0)
df.set_index(["Name", "Time"], inplace=True)
df.sort_index(level=["Name", "Time"], inplace=True)
Output:
X Y x_diff y_diff
Name Time
AA 0 0 0 0.0 0.0
120 5 3 5.0 3.0
BB 30 1 1 0.0 0.0
CC 45 2 2 0.0 0.0
GG:AB 60 3 3 0.0 0.0
GG:AC 90 4 4 0.0 0.0
edited Nov 16 '18 at 16:49
answered Nov 16 '18 at 16:43
EvanEvan
1,161516
1,161516
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53340527%2fgroup-data-using-pandas-but-how-do-i-keep-the-order-of-the-group-and-do-math-on%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Can you post the expected output?
– harvpan
Nov 16 '18 at 16:07
related / possible duplicate: stackoverflow.com/questions/20648346/…
– Evan
Nov 16 '18 at 16:41
Can you clarify what you mean by "total distance"?
– Evan
Nov 16 '18 at 16:53
Possible duplicate of Computing diffs within groups of a dataframe
– Evan
Nov 16 '18 at 16:53