Pandas - Pipe Delimiter exists in Field Value causing Bad Lines












0















I am using Pandas to import a text file like so:



data = pd.read_csv('filepath.txt', sep='|', quoting=3, 
error_bad_lines=False, encoding='latin1', low_memory=False)


I'm getting an error on 1 line because the field value has a Pipe found within it. When it attempts to parse the row, it finds that the length of the row is too long throwing an error. It allows the file to process; however, this row is missing.



Example:



Row -



4321|Test|1/2/1900
1234|Test||1/1/1900


Parsing this file will create:



4321   Test    1/2/1900
1234 Test 1/1/1900


I want to eliminate the extra | in the second row "Test|" or allow pandas to understand that it exists to create:



4321   Test    1/2/1900
1234 Test 1/1/1900


or this would be fine:



 1234   Test|    1/1/1900


I have attempted to use converters, other quoting methods (quotchars, etc), but to no avail.



Any ideas on how to get by this? All recommendations welcome.



Eric










share|improve this question

























  • Can you guarantee this is a problem with only the first line?

    – jpp
    Nov 15 '18 at 16:08











  • Well, it's not actually the first line. It's line 116950 of the file. It will be random; however, I could potentially strip that string column out because I don't need it. Not sure how to evaluate the file before pandas consumes it.

    – Eric
    Nov 15 '18 at 16:20


















0















I am using Pandas to import a text file like so:



data = pd.read_csv('filepath.txt', sep='|', quoting=3, 
error_bad_lines=False, encoding='latin1', low_memory=False)


I'm getting an error on 1 line because the field value has a Pipe found within it. When it attempts to parse the row, it finds that the length of the row is too long throwing an error. It allows the file to process; however, this row is missing.



Example:



Row -



4321|Test|1/2/1900
1234|Test||1/1/1900


Parsing this file will create:



4321   Test    1/2/1900
1234 Test 1/1/1900


I want to eliminate the extra | in the second row "Test|" or allow pandas to understand that it exists to create:



4321   Test    1/2/1900
1234 Test 1/1/1900


or this would be fine:



 1234   Test|    1/1/1900


I have attempted to use converters, other quoting methods (quotchars, etc), but to no avail.



Any ideas on how to get by this? All recommendations welcome.



Eric










share|improve this question

























  • Can you guarantee this is a problem with only the first line?

    – jpp
    Nov 15 '18 at 16:08











  • Well, it's not actually the first line. It's line 116950 of the file. It will be random; however, I could potentially strip that string column out because I don't need it. Not sure how to evaluate the file before pandas consumes it.

    – Eric
    Nov 15 '18 at 16:20
















0












0








0








I am using Pandas to import a text file like so:



data = pd.read_csv('filepath.txt', sep='|', quoting=3, 
error_bad_lines=False, encoding='latin1', low_memory=False)


I'm getting an error on 1 line because the field value has a Pipe found within it. When it attempts to parse the row, it finds that the length of the row is too long throwing an error. It allows the file to process; however, this row is missing.



Example:



Row -



4321|Test|1/2/1900
1234|Test||1/1/1900


Parsing this file will create:



4321   Test    1/2/1900
1234 Test 1/1/1900


I want to eliminate the extra | in the second row "Test|" or allow pandas to understand that it exists to create:



4321   Test    1/2/1900
1234 Test 1/1/1900


or this would be fine:



 1234   Test|    1/1/1900


I have attempted to use converters, other quoting methods (quotchars, etc), but to no avail.



Any ideas on how to get by this? All recommendations welcome.



Eric










share|improve this question
















I am using Pandas to import a text file like so:



data = pd.read_csv('filepath.txt', sep='|', quoting=3, 
error_bad_lines=False, encoding='latin1', low_memory=False)


I'm getting an error on 1 line because the field value has a Pipe found within it. When it attempts to parse the row, it finds that the length of the row is too long throwing an error. It allows the file to process; however, this row is missing.



Example:



Row -



4321|Test|1/2/1900
1234|Test||1/1/1900


Parsing this file will create:



4321   Test    1/2/1900
1234 Test 1/1/1900


I want to eliminate the extra | in the second row "Test|" or allow pandas to understand that it exists to create:



4321   Test    1/2/1900
1234 Test 1/1/1900


or this would be fine:



 1234   Test|    1/1/1900


I have attempted to use converters, other quoting methods (quotchars, etc), but to no avail.



Any ideas on how to get by this? All recommendations welcome.



Eric







python pandas






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 15 '18 at 17:12







Eric

















asked Nov 15 '18 at 16:03









EricEric

256




256













  • Can you guarantee this is a problem with only the first line?

    – jpp
    Nov 15 '18 at 16:08











  • Well, it's not actually the first line. It's line 116950 of the file. It will be random; however, I could potentially strip that string column out because I don't need it. Not sure how to evaluate the file before pandas consumes it.

    – Eric
    Nov 15 '18 at 16:20





















  • Can you guarantee this is a problem with only the first line?

    – jpp
    Nov 15 '18 at 16:08











  • Well, it's not actually the first line. It's line 116950 of the file. It will be random; however, I could potentially strip that string column out because I don't need it. Not sure how to evaluate the file before pandas consumes it.

    – Eric
    Nov 15 '18 at 16:20



















Can you guarantee this is a problem with only the first line?

– jpp
Nov 15 '18 at 16:08





Can you guarantee this is a problem with only the first line?

– jpp
Nov 15 '18 at 16:08













Well, it's not actually the first line. It's line 116950 of the file. It will be random; however, I could potentially strip that string column out because I don't need it. Not sure how to evaluate the file before pandas consumes it.

– Eric
Nov 15 '18 at 16:20







Well, it's not actually the first line. It's line 116950 of the file. It will be random; however, I could potentially strip that string column out because I don't need it. Not sure how to evaluate the file before pandas consumes it.

– Eric
Nov 15 '18 at 16:20














1 Answer
1






active

oldest

votes


















0














I think the easiest way would be to remove any instance of "||" then use pandas. An example of this would be:



import pandas as pd
from io import StringIO


buffer= StringIO()
with open(r'filepath.txt', 'r') as f:
for line in f.readlines():
if "||" not in line:
buffer.write(line)




buffer.seek(0)
data = pd.read_csv(buffer, sep='|', quoting=3,
error_bad_lines=False, encoding='latin1', low_memory=False)


You could also do it outside of python with a find and replace operation.






share|improve this answer


























  • I considered this as an option. Unfortunately, NULL values and Non-NULL values exist within the file inside the same column, so this option would generate the same issue, just on different rows.

    – Eric
    Nov 15 '18 at 18:32











  • Are you okay with dropping every row that contains a null value?

    – krflol
    Nov 15 '18 at 19:03













  • Absolutely. I'm good with just deleting the whole column, if possible. I'd just like to do it programmatically instead of manually. I'm willing to accept any possible solution.

    – Eric
    Nov 15 '18 at 21:13













  • edited. Should remove any line with '||'

    – krflol
    Nov 15 '18 at 21:16











  • This causes my file to drop to only 2 rows as most rows have a NULL value found in an individual column. It ignores all the rows where '||' is found. I think I may just have to request the file delimiter to be changed.

    – Eric
    Nov 16 '18 at 15:43











Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53323388%2fpandas-pipe-delimiter-exists-in-field-value-causing-bad-lines%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









0














I think the easiest way would be to remove any instance of "||" then use pandas. An example of this would be:



import pandas as pd
from io import StringIO


buffer= StringIO()
with open(r'filepath.txt', 'r') as f:
for line in f.readlines():
if "||" not in line:
buffer.write(line)




buffer.seek(0)
data = pd.read_csv(buffer, sep='|', quoting=3,
error_bad_lines=False, encoding='latin1', low_memory=False)


You could also do it outside of python with a find and replace operation.






share|improve this answer


























  • I considered this as an option. Unfortunately, NULL values and Non-NULL values exist within the file inside the same column, so this option would generate the same issue, just on different rows.

    – Eric
    Nov 15 '18 at 18:32











  • Are you okay with dropping every row that contains a null value?

    – krflol
    Nov 15 '18 at 19:03













  • Absolutely. I'm good with just deleting the whole column, if possible. I'd just like to do it programmatically instead of manually. I'm willing to accept any possible solution.

    – Eric
    Nov 15 '18 at 21:13













  • edited. Should remove any line with '||'

    – krflol
    Nov 15 '18 at 21:16











  • This causes my file to drop to only 2 rows as most rows have a NULL value found in an individual column. It ignores all the rows where '||' is found. I think I may just have to request the file delimiter to be changed.

    – Eric
    Nov 16 '18 at 15:43
















0














I think the easiest way would be to remove any instance of "||" then use pandas. An example of this would be:



import pandas as pd
from io import StringIO


buffer= StringIO()
with open(r'filepath.txt', 'r') as f:
for line in f.readlines():
if "||" not in line:
buffer.write(line)




buffer.seek(0)
data = pd.read_csv(buffer, sep='|', quoting=3,
error_bad_lines=False, encoding='latin1', low_memory=False)


You could also do it outside of python with a find and replace operation.






share|improve this answer


























  • I considered this as an option. Unfortunately, NULL values and Non-NULL values exist within the file inside the same column, so this option would generate the same issue, just on different rows.

    – Eric
    Nov 15 '18 at 18:32











  • Are you okay with dropping every row that contains a null value?

    – krflol
    Nov 15 '18 at 19:03













  • Absolutely. I'm good with just deleting the whole column, if possible. I'd just like to do it programmatically instead of manually. I'm willing to accept any possible solution.

    – Eric
    Nov 15 '18 at 21:13













  • edited. Should remove any line with '||'

    – krflol
    Nov 15 '18 at 21:16











  • This causes my file to drop to only 2 rows as most rows have a NULL value found in an individual column. It ignores all the rows where '||' is found. I think I may just have to request the file delimiter to be changed.

    – Eric
    Nov 16 '18 at 15:43














0












0








0







I think the easiest way would be to remove any instance of "||" then use pandas. An example of this would be:



import pandas as pd
from io import StringIO


buffer= StringIO()
with open(r'filepath.txt', 'r') as f:
for line in f.readlines():
if "||" not in line:
buffer.write(line)




buffer.seek(0)
data = pd.read_csv(buffer, sep='|', quoting=3,
error_bad_lines=False, encoding='latin1', low_memory=False)


You could also do it outside of python with a find and replace operation.






share|improve this answer















I think the easiest way would be to remove any instance of "||" then use pandas. An example of this would be:



import pandas as pd
from io import StringIO


buffer= StringIO()
with open(r'filepath.txt', 'r') as f:
for line in f.readlines():
if "||" not in line:
buffer.write(line)




buffer.seek(0)
data = pd.read_csv(buffer, sep='|', quoting=3,
error_bad_lines=False, encoding='latin1', low_memory=False)


You could also do it outside of python with a find and replace operation.







share|improve this answer














share|improve this answer



share|improve this answer








edited Nov 15 '18 at 21:15

























answered Nov 15 '18 at 17:20









krflolkrflol

53728




53728













  • I considered this as an option. Unfortunately, NULL values and Non-NULL values exist within the file inside the same column, so this option would generate the same issue, just on different rows.

    – Eric
    Nov 15 '18 at 18:32











  • Are you okay with dropping every row that contains a null value?

    – krflol
    Nov 15 '18 at 19:03













  • Absolutely. I'm good with just deleting the whole column, if possible. I'd just like to do it programmatically instead of manually. I'm willing to accept any possible solution.

    – Eric
    Nov 15 '18 at 21:13













  • edited. Should remove any line with '||'

    – krflol
    Nov 15 '18 at 21:16











  • This causes my file to drop to only 2 rows as most rows have a NULL value found in an individual column. It ignores all the rows where '||' is found. I think I may just have to request the file delimiter to be changed.

    – Eric
    Nov 16 '18 at 15:43



















  • I considered this as an option. Unfortunately, NULL values and Non-NULL values exist within the file inside the same column, so this option would generate the same issue, just on different rows.

    – Eric
    Nov 15 '18 at 18:32











  • Are you okay with dropping every row that contains a null value?

    – krflol
    Nov 15 '18 at 19:03













  • Absolutely. I'm good with just deleting the whole column, if possible. I'd just like to do it programmatically instead of manually. I'm willing to accept any possible solution.

    – Eric
    Nov 15 '18 at 21:13













  • edited. Should remove any line with '||'

    – krflol
    Nov 15 '18 at 21:16











  • This causes my file to drop to only 2 rows as most rows have a NULL value found in an individual column. It ignores all the rows where '||' is found. I think I may just have to request the file delimiter to be changed.

    – Eric
    Nov 16 '18 at 15:43

















I considered this as an option. Unfortunately, NULL values and Non-NULL values exist within the file inside the same column, so this option would generate the same issue, just on different rows.

– Eric
Nov 15 '18 at 18:32





I considered this as an option. Unfortunately, NULL values and Non-NULL values exist within the file inside the same column, so this option would generate the same issue, just on different rows.

– Eric
Nov 15 '18 at 18:32













Are you okay with dropping every row that contains a null value?

– krflol
Nov 15 '18 at 19:03







Are you okay with dropping every row that contains a null value?

– krflol
Nov 15 '18 at 19:03















Absolutely. I'm good with just deleting the whole column, if possible. I'd just like to do it programmatically instead of manually. I'm willing to accept any possible solution.

– Eric
Nov 15 '18 at 21:13







Absolutely. I'm good with just deleting the whole column, if possible. I'd just like to do it programmatically instead of manually. I'm willing to accept any possible solution.

– Eric
Nov 15 '18 at 21:13















edited. Should remove any line with '||'

– krflol
Nov 15 '18 at 21:16





edited. Should remove any line with '||'

– krflol
Nov 15 '18 at 21:16













This causes my file to drop to only 2 rows as most rows have a NULL value found in an individual column. It ignores all the rows where '||' is found. I think I may just have to request the file delimiter to be changed.

– Eric
Nov 16 '18 at 15:43





This causes my file to drop to only 2 rows as most rows have a NULL value found in an individual column. It ignores all the rows where '||' is found. I think I may just have to request the file delimiter to be changed.

– Eric
Nov 16 '18 at 15:43




















draft saved

draft discarded




















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53323388%2fpandas-pipe-delimiter-exists-in-field-value-causing-bad-lines%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Florida Star v. B. J. F.

Danny Elfman

Retrieve a Users Dashboard in Tumblr with R and TumblR. Oauth Issues