Best way to fix inconsistent csv file in python
I have a csv file which is not consistent. It looks like this where some have a middle name and some do not. I don't know the best way to fix this. The middle name will always be in the second position if it exists. But if a middle name doesn't exist the last name is in the second position.
john,doe,52,florida
jane,mary,doe,55,texas
fred,johnson,23,maine
wally,mark,david,44,florida
python-3.x csv
add a comment |
I have a csv file which is not consistent. It looks like this where some have a middle name and some do not. I don't know the best way to fix this. The middle name will always be in the second position if it exists. But if a middle name doesn't exist the last name is in the second position.
john,doe,52,florida
jane,mary,doe,55,texas
fred,johnson,23,maine
wally,mark,david,44,florida
python-3.x csv
What is exactly what you want to do? Is your python program creating this csv file? Or are you meant to fix the gap problem with your python program?
– gkapellmann
Nov 13 '18 at 13:02
It's creating it. I struggle with programming. I am a a network engineer (and a good one actually) but do Python for its use and to keep my ego in check. it doesn't come natural to me. So I output into a text file; the output is just not consistent due to the source so I decided to make it easier for me and try to fix the flawed csv file rather than the initial parse then I could see how to back into fixing the parse. But I am open to any guidance. My problem I think is I don't know what I don't know here. I am trying list comprehensions. currently (still no joy)
– Seth
Nov 14 '18 at 14:13
add a comment |
I have a csv file which is not consistent. It looks like this where some have a middle name and some do not. I don't know the best way to fix this. The middle name will always be in the second position if it exists. But if a middle name doesn't exist the last name is in the second position.
john,doe,52,florida
jane,mary,doe,55,texas
fred,johnson,23,maine
wally,mark,david,44,florida
python-3.x csv
I have a csv file which is not consistent. It looks like this where some have a middle name and some do not. I don't know the best way to fix this. The middle name will always be in the second position if it exists. But if a middle name doesn't exist the last name is in the second position.
john,doe,52,florida
jane,mary,doe,55,texas
fred,johnson,23,maine
wally,mark,david,44,florida
python-3.x csv
python-3.x csv
edited Nov 13 '18 at 13:02
Sociopath
3,76991635
3,76991635
asked Nov 13 '18 at 12:59
SethSeth
441112
441112
What is exactly what you want to do? Is your python program creating this csv file? Or are you meant to fix the gap problem with your python program?
– gkapellmann
Nov 13 '18 at 13:02
It's creating it. I struggle with programming. I am a a network engineer (and a good one actually) but do Python for its use and to keep my ego in check. it doesn't come natural to me. So I output into a text file; the output is just not consistent due to the source so I decided to make it easier for me and try to fix the flawed csv file rather than the initial parse then I could see how to back into fixing the parse. But I am open to any guidance. My problem I think is I don't know what I don't know here. I am trying list comprehensions. currently (still no joy)
– Seth
Nov 14 '18 at 14:13
add a comment |
What is exactly what you want to do? Is your python program creating this csv file? Or are you meant to fix the gap problem with your python program?
– gkapellmann
Nov 13 '18 at 13:02
It's creating it. I struggle with programming. I am a a network engineer (and a good one actually) but do Python for its use and to keep my ego in check. it doesn't come natural to me. So I output into a text file; the output is just not consistent due to the source so I decided to make it easier for me and try to fix the flawed csv file rather than the initial parse then I could see how to back into fixing the parse. But I am open to any guidance. My problem I think is I don't know what I don't know here. I am trying list comprehensions. currently (still no joy)
– Seth
Nov 14 '18 at 14:13
What is exactly what you want to do? Is your python program creating this csv file? Or are you meant to fix the gap problem with your python program?
– gkapellmann
Nov 13 '18 at 13:02
What is exactly what you want to do? Is your python program creating this csv file? Or are you meant to fix the gap problem with your python program?
– gkapellmann
Nov 13 '18 at 13:02
It's creating it. I struggle with programming. I am a a network engineer (and a good one actually) but do Python for its use and to keep my ego in check. it doesn't come natural to me. So I output into a text file; the output is just not consistent due to the source so I decided to make it easier for me and try to fix the flawed csv file rather than the initial parse then I could see how to back into fixing the parse. But I am open to any guidance. My problem I think is I don't know what I don't know here. I am trying list comprehensions. currently (still no joy)
– Seth
Nov 14 '18 at 14:13
It's creating it. I struggle with programming. I am a a network engineer (and a good one actually) but do Python for its use and to keep my ego in check. it doesn't come natural to me. So I output into a text file; the output is just not consistent due to the source so I decided to make it easier for me and try to fix the flawed csv file rather than the initial parse then I could see how to back into fixing the parse. But I am open to any guidance. My problem I think is I don't know what I don't know here. I am trying list comprehensions. currently (still no joy)
– Seth
Nov 14 '18 at 14:13
add a comment |
1 Answer
1
active
oldest
votes
Let's say that you have ① wrong.csv
and want to produce ② fixed.csv
.
You want to read a line from ①, fix it and write the fixed line to ②, this can be done like this
with open('wrong.csv') as input, open('fixed.csv', 'w') as output:
for line in input:
line = fix(line)
output.write(line)
Now we want to define the fix
function...
Each line has either 3 or 4 fields, separated by commas, so what we want to do is splitting the line using the comma as a delimiter, return the unmodified line if the number of fields is 3, otherwise join the field 0 and the field 1 (Python counts from zero...), reassemble the output line and return it to the caller.
def fix(line):
items = line.split(',') # items is a list of strings
if len(items) == 3: # the line is OK as it stands
return line
# join first and middle name
first_middle = join(' ')((items[0], items[1]))
# we want to return a "fixed" line,
# i.e., a string not a list of strings
# we have to join the new name with the remaining info
return ','.join([first_second]+items[2:])
Of course thewith …
code block can be reduced tooutput.write(''.join(fix(line) for line in input))
– gboffi
Nov 13 '18 at 22:37
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53281565%2fbest-way-to-fix-inconsistent-csv-file-in-python%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
Let's say that you have ① wrong.csv
and want to produce ② fixed.csv
.
You want to read a line from ①, fix it and write the fixed line to ②, this can be done like this
with open('wrong.csv') as input, open('fixed.csv', 'w') as output:
for line in input:
line = fix(line)
output.write(line)
Now we want to define the fix
function...
Each line has either 3 or 4 fields, separated by commas, so what we want to do is splitting the line using the comma as a delimiter, return the unmodified line if the number of fields is 3, otherwise join the field 0 and the field 1 (Python counts from zero...), reassemble the output line and return it to the caller.
def fix(line):
items = line.split(',') # items is a list of strings
if len(items) == 3: # the line is OK as it stands
return line
# join first and middle name
first_middle = join(' ')((items[0], items[1]))
# we want to return a "fixed" line,
# i.e., a string not a list of strings
# we have to join the new name with the remaining info
return ','.join([first_second]+items[2:])
Of course thewith …
code block can be reduced tooutput.write(''.join(fix(line) for line in input))
– gboffi
Nov 13 '18 at 22:37
add a comment |
Let's say that you have ① wrong.csv
and want to produce ② fixed.csv
.
You want to read a line from ①, fix it and write the fixed line to ②, this can be done like this
with open('wrong.csv') as input, open('fixed.csv', 'w') as output:
for line in input:
line = fix(line)
output.write(line)
Now we want to define the fix
function...
Each line has either 3 or 4 fields, separated by commas, so what we want to do is splitting the line using the comma as a delimiter, return the unmodified line if the number of fields is 3, otherwise join the field 0 and the field 1 (Python counts from zero...), reassemble the output line and return it to the caller.
def fix(line):
items = line.split(',') # items is a list of strings
if len(items) == 3: # the line is OK as it stands
return line
# join first and middle name
first_middle = join(' ')((items[0], items[1]))
# we want to return a "fixed" line,
# i.e., a string not a list of strings
# we have to join the new name with the remaining info
return ','.join([first_second]+items[2:])
Of course thewith …
code block can be reduced tooutput.write(''.join(fix(line) for line in input))
– gboffi
Nov 13 '18 at 22:37
add a comment |
Let's say that you have ① wrong.csv
and want to produce ② fixed.csv
.
You want to read a line from ①, fix it and write the fixed line to ②, this can be done like this
with open('wrong.csv') as input, open('fixed.csv', 'w') as output:
for line in input:
line = fix(line)
output.write(line)
Now we want to define the fix
function...
Each line has either 3 or 4 fields, separated by commas, so what we want to do is splitting the line using the comma as a delimiter, return the unmodified line if the number of fields is 3, otherwise join the field 0 and the field 1 (Python counts from zero...), reassemble the output line and return it to the caller.
def fix(line):
items = line.split(',') # items is a list of strings
if len(items) == 3: # the line is OK as it stands
return line
# join first and middle name
first_middle = join(' ')((items[0], items[1]))
# we want to return a "fixed" line,
# i.e., a string not a list of strings
# we have to join the new name with the remaining info
return ','.join([first_second]+items[2:])
Let's say that you have ① wrong.csv
and want to produce ② fixed.csv
.
You want to read a line from ①, fix it and write the fixed line to ②, this can be done like this
with open('wrong.csv') as input, open('fixed.csv', 'w') as output:
for line in input:
line = fix(line)
output.write(line)
Now we want to define the fix
function...
Each line has either 3 or 4 fields, separated by commas, so what we want to do is splitting the line using the comma as a delimiter, return the unmodified line if the number of fields is 3, otherwise join the field 0 and the field 1 (Python counts from zero...), reassemble the output line and return it to the caller.
def fix(line):
items = line.split(',') # items is a list of strings
if len(items) == 3: # the line is OK as it stands
return line
# join first and middle name
first_middle = join(' ')((items[0], items[1]))
# we want to return a "fixed" line,
# i.e., a string not a list of strings
# we have to join the new name with the remaining info
return ','.join([first_second]+items[2:])
answered Nov 13 '18 at 14:10
gboffigboffi
8,98622455
8,98622455
Of course thewith …
code block can be reduced tooutput.write(''.join(fix(line) for line in input))
– gboffi
Nov 13 '18 at 22:37
add a comment |
Of course thewith …
code block can be reduced tooutput.write(''.join(fix(line) for line in input))
– gboffi
Nov 13 '18 at 22:37
Of course the
with …
code block can be reduced to output.write(''.join(fix(line) for line in input))
– gboffi
Nov 13 '18 at 22:37
Of course the
with …
code block can be reduced to output.write(''.join(fix(line) for line in input))
– gboffi
Nov 13 '18 at 22:37
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53281565%2fbest-way-to-fix-inconsistent-csv-file-in-python%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
What is exactly what you want to do? Is your python program creating this csv file? Or are you meant to fix the gap problem with your python program?
– gkapellmann
Nov 13 '18 at 13:02
It's creating it. I struggle with programming. I am a a network engineer (and a good one actually) but do Python for its use and to keep my ego in check. it doesn't come natural to me. So I output into a text file; the output is just not consistent due to the source so I decided to make it easier for me and try to fix the flawed csv file rather than the initial parse then I could see how to back into fixing the parse. But I am open to any guidance. My problem I think is I don't know what I don't know here. I am trying list comprehensions. currently (still no joy)
– Seth
Nov 14 '18 at 14:13