Converting a Dataframe into a Series with cells containing arrays in Pandas
up vote
1
down vote
favorite
Sorry if this has been answered before, but I'm having trouble with the solution.
I have a 2D DataFrame with column names, where the elements contain both non-null and null values. I'd like to 'flatten' the 2D DataFrame to a 1D Series, where I preserve only the non-null data as a list in the series cell with the corresponding header.
ie: the following:
Going from (type pandas.Dataframe):
| asset | name | id |
---------------------
| a | john | 001|
| a | NaN | 002|
| NaN | dave | 003|
To (type pandas.Series):
| asset | name | id |
------------------------------------------
| [a] | [john, dave] | [001, 002, 003] |
Thank you!
EDIT: Why I would need this:
I am starting with a large DataFrame that has multiple duplicated attributes with timestamped 'rows'. At any given timestamp, the information in the rows could be added to, to deleted. I have used df.where()
to return a dataframe of the unique values, and am attempting to flatten it down to one attribute collection of 'ids' per row.
In practice, the example table is from a a single GroupBy object.
python pandas dataframe series
add a comment |
up vote
1
down vote
favorite
Sorry if this has been answered before, but I'm having trouble with the solution.
I have a 2D DataFrame with column names, where the elements contain both non-null and null values. I'd like to 'flatten' the 2D DataFrame to a 1D Series, where I preserve only the non-null data as a list in the series cell with the corresponding header.
ie: the following:
Going from (type pandas.Dataframe):
| asset | name | id |
---------------------
| a | john | 001|
| a | NaN | 002|
| NaN | dave | 003|
To (type pandas.Series):
| asset | name | id |
------------------------------------------
| [a] | [john, dave] | [001, 002, 003] |
Thank you!
EDIT: Why I would need this:
I am starting with a large DataFrame that has multiple duplicated attributes with timestamped 'rows'. At any given timestamp, the information in the rows could be added to, to deleted. I have used df.where()
to return a dataframe of the unique values, and am attempting to flatten it down to one attribute collection of 'ids' per row.
In practice, the example table is from a a single GroupBy object.
python pandas dataframe series
1
Can you please explain why you would want this?
– coldspeed
Nov 10 at 23:16
add a comment |
up vote
1
down vote
favorite
up vote
1
down vote
favorite
Sorry if this has been answered before, but I'm having trouble with the solution.
I have a 2D DataFrame with column names, where the elements contain both non-null and null values. I'd like to 'flatten' the 2D DataFrame to a 1D Series, where I preserve only the non-null data as a list in the series cell with the corresponding header.
ie: the following:
Going from (type pandas.Dataframe):
| asset | name | id |
---------------------
| a | john | 001|
| a | NaN | 002|
| NaN | dave | 003|
To (type pandas.Series):
| asset | name | id |
------------------------------------------
| [a] | [john, dave] | [001, 002, 003] |
Thank you!
EDIT: Why I would need this:
I am starting with a large DataFrame that has multiple duplicated attributes with timestamped 'rows'. At any given timestamp, the information in the rows could be added to, to deleted. I have used df.where()
to return a dataframe of the unique values, and am attempting to flatten it down to one attribute collection of 'ids' per row.
In practice, the example table is from a a single GroupBy object.
python pandas dataframe series
Sorry if this has been answered before, but I'm having trouble with the solution.
I have a 2D DataFrame with column names, where the elements contain both non-null and null values. I'd like to 'flatten' the 2D DataFrame to a 1D Series, where I preserve only the non-null data as a list in the series cell with the corresponding header.
ie: the following:
Going from (type pandas.Dataframe):
| asset | name | id |
---------------------
| a | john | 001|
| a | NaN | 002|
| NaN | dave | 003|
To (type pandas.Series):
| asset | name | id |
------------------------------------------
| [a] | [john, dave] | [001, 002, 003] |
Thank you!
EDIT: Why I would need this:
I am starting with a large DataFrame that has multiple duplicated attributes with timestamped 'rows'. At any given timestamp, the information in the rows could be added to, to deleted. I have used df.where()
to return a dataframe of the unique values, and am attempting to flatten it down to one attribute collection of 'ids' per row.
In practice, the example table is from a a single GroupBy object.
python pandas dataframe series
python pandas dataframe series
edited Nov 10 at 23:26
asked Nov 10 at 23:15
Paul Choi
6018
6018
1
Can you please explain why you would want this?
– coldspeed
Nov 10 at 23:16
add a comment |
1
Can you please explain why you would want this?
– coldspeed
Nov 10 at 23:16
1
1
Can you please explain why you would want this?
– coldspeed
Nov 10 at 23:16
Can you please explain why you would want this?
– coldspeed
Nov 10 at 23:16
add a comment |
1 Answer
1
active
oldest
votes
up vote
1
down vote
accepted
Instantiate a new series using a dict comprehension (this should be faster than an apply
based solution).
pd.Series({c : df[c].dropna().unique().tolist() for c in df.columns})
asset [a]
name [john, dave]
id [1, 2, 3]
dtype: object
If you want a single rowed DataFrame instead, use
pd.Series(
{c : df[c].dropna().unique().tolist() for c in df.columns}
).to_frame().T
asset name id
0 [a] [john, dave] [1, 2, 3]
add a comment |
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
1
down vote
accepted
Instantiate a new series using a dict comprehension (this should be faster than an apply
based solution).
pd.Series({c : df[c].dropna().unique().tolist() for c in df.columns})
asset [a]
name [john, dave]
id [1, 2, 3]
dtype: object
If you want a single rowed DataFrame instead, use
pd.Series(
{c : df[c].dropna().unique().tolist() for c in df.columns}
).to_frame().T
asset name id
0 [a] [john, dave] [1, 2, 3]
add a comment |
up vote
1
down vote
accepted
Instantiate a new series using a dict comprehension (this should be faster than an apply
based solution).
pd.Series({c : df[c].dropna().unique().tolist() for c in df.columns})
asset [a]
name [john, dave]
id [1, 2, 3]
dtype: object
If you want a single rowed DataFrame instead, use
pd.Series(
{c : df[c].dropna().unique().tolist() for c in df.columns}
).to_frame().T
asset name id
0 [a] [john, dave] [1, 2, 3]
add a comment |
up vote
1
down vote
accepted
up vote
1
down vote
accepted
Instantiate a new series using a dict comprehension (this should be faster than an apply
based solution).
pd.Series({c : df[c].dropna().unique().tolist() for c in df.columns})
asset [a]
name [john, dave]
id [1, 2, 3]
dtype: object
If you want a single rowed DataFrame instead, use
pd.Series(
{c : df[c].dropna().unique().tolist() for c in df.columns}
).to_frame().T
asset name id
0 [a] [john, dave] [1, 2, 3]
Instantiate a new series using a dict comprehension (this should be faster than an apply
based solution).
pd.Series({c : df[c].dropna().unique().tolist() for c in df.columns})
asset [a]
name [john, dave]
id [1, 2, 3]
dtype: object
If you want a single rowed DataFrame instead, use
pd.Series(
{c : df[c].dropna().unique().tolist() for c in df.columns}
).to_frame().T
asset name id
0 [a] [john, dave] [1, 2, 3]
answered Nov 10 at 23:18
coldspeed
111k17101170
111k17101170
add a comment |
add a comment |
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53244352%2fconverting-a-dataframe-into-a-series-with-cells-containing-arrays-in-pandas%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
Can you please explain why you would want this?
– coldspeed
Nov 10 at 23:16