Search for part strings in header python pandas
I think I've read all similar posts and haven't found what I need.
I have a bunch of .csv files which are in principle similar but may have a bit different Header names, columns are positioned differently etc.
I call them using pd.read_csv:
df = pd.read_csv('MyFile.csv', delimiter=';')
Here is a part of sample csv file header:
Index(['1. Datum', '2. Zeit', '3. Tunnellaenge. m',
'4. Vermessung: Hor. Ablage der Maschine. mm',
'5. Vermessung: Vert. Ablage der Maschine. mm',
………...
'21. SR:Drehzahl. rpm', '22. SR:Erddruck Schild. bar',
'23. STZ:Gesamtkraft. kN', 'Unnamed: 23'],
dtype='object'
I want that my code looks into the header and finds the column I want (based on part strings).
For instance, I always need column '3. Tunnellaenge. m', the name usually doesn't Change, so I would use:
df['length'] = df.filter(like='laenge')
It usually works, but what if I want to search for a keyword 'laenge' and/o 'length'?
Like in case of header '4. Vermessung: Hor. Ablage der Maschine. mm',. Here I want that df.filter Returns the column which includes 'Hor' AND 'Maschine'. How could I do it? I also tried 'regex' function, but it didn't work for me. Should it be better to use str.contains() function?
It is very important as I have many different CSV files and don't want to adjust the code every time.
Thank you.
python pandas dataframe
add a comment |
I think I've read all similar posts and haven't found what I need.
I have a bunch of .csv files which are in principle similar but may have a bit different Header names, columns are positioned differently etc.
I call them using pd.read_csv:
df = pd.read_csv('MyFile.csv', delimiter=';')
Here is a part of sample csv file header:
Index(['1. Datum', '2. Zeit', '3. Tunnellaenge. m',
'4. Vermessung: Hor. Ablage der Maschine. mm',
'5. Vermessung: Vert. Ablage der Maschine. mm',
………...
'21. SR:Drehzahl. rpm', '22. SR:Erddruck Schild. bar',
'23. STZ:Gesamtkraft. kN', 'Unnamed: 23'],
dtype='object'
I want that my code looks into the header and finds the column I want (based on part strings).
For instance, I always need column '3. Tunnellaenge. m', the name usually doesn't Change, so I would use:
df['length'] = df.filter(like='laenge')
It usually works, but what if I want to search for a keyword 'laenge' and/o 'length'?
Like in case of header '4. Vermessung: Hor. Ablage der Maschine. mm',. Here I want that df.filter Returns the column which includes 'Hor' AND 'Maschine'. How could I do it? I also tried 'regex' function, but it didn't work for me. Should it be better to use str.contains() function?
It is very important as I have many different CSV files and don't want to adjust the code every time.
Thank you.
python pandas dataframe
add a comment |
I think I've read all similar posts and haven't found what I need.
I have a bunch of .csv files which are in principle similar but may have a bit different Header names, columns are positioned differently etc.
I call them using pd.read_csv:
df = pd.read_csv('MyFile.csv', delimiter=';')
Here is a part of sample csv file header:
Index(['1. Datum', '2. Zeit', '3. Tunnellaenge. m',
'4. Vermessung: Hor. Ablage der Maschine. mm',
'5. Vermessung: Vert. Ablage der Maschine. mm',
………...
'21. SR:Drehzahl. rpm', '22. SR:Erddruck Schild. bar',
'23. STZ:Gesamtkraft. kN', 'Unnamed: 23'],
dtype='object'
I want that my code looks into the header and finds the column I want (based on part strings).
For instance, I always need column '3. Tunnellaenge. m', the name usually doesn't Change, so I would use:
df['length'] = df.filter(like='laenge')
It usually works, but what if I want to search for a keyword 'laenge' and/o 'length'?
Like in case of header '4. Vermessung: Hor. Ablage der Maschine. mm',. Here I want that df.filter Returns the column which includes 'Hor' AND 'Maschine'. How could I do it? I also tried 'regex' function, but it didn't work for me. Should it be better to use str.contains() function?
It is very important as I have many different CSV files and don't want to adjust the code every time.
Thank you.
python pandas dataframe
I think I've read all similar posts and haven't found what I need.
I have a bunch of .csv files which are in principle similar but may have a bit different Header names, columns are positioned differently etc.
I call them using pd.read_csv:
df = pd.read_csv('MyFile.csv', delimiter=';')
Here is a part of sample csv file header:
Index(['1. Datum', '2. Zeit', '3. Tunnellaenge. m',
'4. Vermessung: Hor. Ablage der Maschine. mm',
'5. Vermessung: Vert. Ablage der Maschine. mm',
………...
'21. SR:Drehzahl. rpm', '22. SR:Erddruck Schild. bar',
'23. STZ:Gesamtkraft. kN', 'Unnamed: 23'],
dtype='object'
I want that my code looks into the header and finds the column I want (based on part strings).
For instance, I always need column '3. Tunnellaenge. m', the name usually doesn't Change, so I would use:
df['length'] = df.filter(like='laenge')
It usually works, but what if I want to search for a keyword 'laenge' and/o 'length'?
Like in case of header '4. Vermessung: Hor. Ablage der Maschine. mm',. Here I want that df.filter Returns the column which includes 'Hor' AND 'Maschine'. How could I do it? I also tried 'regex' function, but it didn't work for me. Should it be better to use str.contains() function?
It is very important as I have many different CSV files and don't want to adjust the code every time.
Thank you.
python pandas dataframe
python pandas dataframe
edited Nov 15 '18 at 9:22
Masoud Zarjani
4051312
4051312
asked Nov 15 '18 at 8:13
ZigaZiga
175
175
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
Use:
m1 = df.columns.str.contains('laenge')
m2 = df.columns.str.contains('length')
m = m1 & m2
df1 = df.loc[:, m]
How aboutdf.columns.str.contains(r'laenge|length', regex=True)
– Vivek Kalyanarangan
Nov 15 '18 at 8:35
And how could I rename the column df1 and then call it from df['Name'] ? Just using df1 is fine for plotting but I can't make computation against other columns, because it's not included in DataFrame or?
– Ziga
Nov 15 '18 at 11:19
So with df['New Column'] = df.loc[:, m] I can add a new column. I'm still not sure how to replace the column's name tho.
– Ziga
Nov 15 '18 at 12:22
@Ziga - So always is returned only one column?
– jezrael
Nov 15 '18 at 12:24
I want the column which is found bydf1 = df.loc[:, m]
to be renamed and then used with df['New Column Name']
– Ziga
Nov 15 '18 at 14:52
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53314971%2fsearch-for-part-strings-in-header-python-pandas%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
Use:
m1 = df.columns.str.contains('laenge')
m2 = df.columns.str.contains('length')
m = m1 & m2
df1 = df.loc[:, m]
How aboutdf.columns.str.contains(r'laenge|length', regex=True)
– Vivek Kalyanarangan
Nov 15 '18 at 8:35
And how could I rename the column df1 and then call it from df['Name'] ? Just using df1 is fine for plotting but I can't make computation against other columns, because it's not included in DataFrame or?
– Ziga
Nov 15 '18 at 11:19
So with df['New Column'] = df.loc[:, m] I can add a new column. I'm still not sure how to replace the column's name tho.
– Ziga
Nov 15 '18 at 12:22
@Ziga - So always is returned only one column?
– jezrael
Nov 15 '18 at 12:24
I want the column which is found bydf1 = df.loc[:, m]
to be renamed and then used with df['New Column Name']
– Ziga
Nov 15 '18 at 14:52
add a comment |
Use:
m1 = df.columns.str.contains('laenge')
m2 = df.columns.str.contains('length')
m = m1 & m2
df1 = df.loc[:, m]
How aboutdf.columns.str.contains(r'laenge|length', regex=True)
– Vivek Kalyanarangan
Nov 15 '18 at 8:35
And how could I rename the column df1 and then call it from df['Name'] ? Just using df1 is fine for plotting but I can't make computation against other columns, because it's not included in DataFrame or?
– Ziga
Nov 15 '18 at 11:19
So with df['New Column'] = df.loc[:, m] I can add a new column. I'm still not sure how to replace the column's name tho.
– Ziga
Nov 15 '18 at 12:22
@Ziga - So always is returned only one column?
– jezrael
Nov 15 '18 at 12:24
I want the column which is found bydf1 = df.loc[:, m]
to be renamed and then used with df['New Column Name']
– Ziga
Nov 15 '18 at 14:52
add a comment |
Use:
m1 = df.columns.str.contains('laenge')
m2 = df.columns.str.contains('length')
m = m1 & m2
df1 = df.loc[:, m]
Use:
m1 = df.columns.str.contains('laenge')
m2 = df.columns.str.contains('length')
m = m1 & m2
df1 = df.loc[:, m]
answered Nov 15 '18 at 8:17
jezraeljezrael
344k25297370
344k25297370
How aboutdf.columns.str.contains(r'laenge|length', regex=True)
– Vivek Kalyanarangan
Nov 15 '18 at 8:35
And how could I rename the column df1 and then call it from df['Name'] ? Just using df1 is fine for plotting but I can't make computation against other columns, because it's not included in DataFrame or?
– Ziga
Nov 15 '18 at 11:19
So with df['New Column'] = df.loc[:, m] I can add a new column. I'm still not sure how to replace the column's name tho.
– Ziga
Nov 15 '18 at 12:22
@Ziga - So always is returned only one column?
– jezrael
Nov 15 '18 at 12:24
I want the column which is found bydf1 = df.loc[:, m]
to be renamed and then used with df['New Column Name']
– Ziga
Nov 15 '18 at 14:52
add a comment |
How aboutdf.columns.str.contains(r'laenge|length', regex=True)
– Vivek Kalyanarangan
Nov 15 '18 at 8:35
And how could I rename the column df1 and then call it from df['Name'] ? Just using df1 is fine for plotting but I can't make computation against other columns, because it's not included in DataFrame or?
– Ziga
Nov 15 '18 at 11:19
So with df['New Column'] = df.loc[:, m] I can add a new column. I'm still not sure how to replace the column's name tho.
– Ziga
Nov 15 '18 at 12:22
@Ziga - So always is returned only one column?
– jezrael
Nov 15 '18 at 12:24
I want the column which is found bydf1 = df.loc[:, m]
to be renamed and then used with df['New Column Name']
– Ziga
Nov 15 '18 at 14:52
How about
df.columns.str.contains(r'laenge|length', regex=True)
– Vivek Kalyanarangan
Nov 15 '18 at 8:35
How about
df.columns.str.contains(r'laenge|length', regex=True)
– Vivek Kalyanarangan
Nov 15 '18 at 8:35
And how could I rename the column df1 and then call it from df['Name'] ? Just using df1 is fine for plotting but I can't make computation against other columns, because it's not included in DataFrame or?
– Ziga
Nov 15 '18 at 11:19
And how could I rename the column df1 and then call it from df['Name'] ? Just using df1 is fine for plotting but I can't make computation against other columns, because it's not included in DataFrame or?
– Ziga
Nov 15 '18 at 11:19
So with df['New Column'] = df.loc[:, m] I can add a new column. I'm still not sure how to replace the column's name tho.
– Ziga
Nov 15 '18 at 12:22
So with df['New Column'] = df.loc[:, m] I can add a new column. I'm still not sure how to replace the column's name tho.
– Ziga
Nov 15 '18 at 12:22
@Ziga - So always is returned only one column?
– jezrael
Nov 15 '18 at 12:24
@Ziga - So always is returned only one column?
– jezrael
Nov 15 '18 at 12:24
I want the column which is found by
df1 = df.loc[:, m]
to be renamed and then used with df['New Column Name']– Ziga
Nov 15 '18 at 14:52
I want the column which is found by
df1 = df.loc[:, m]
to be renamed and then used with df['New Column Name']– Ziga
Nov 15 '18 at 14:52
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53314971%2fsearch-for-part-strings-in-header-python-pandas%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown