Problem with python memory, flush, csv size
After solving a sorting of a dataset, I have a problem at this point of my code.
with open(fns_land[xx]) as infile:
lines = infile.readlines()
for line in lines:
result_station.append(line.split(',')[0])
result_date.append(line.split(',')[1])
result_metar.append(line.split(',')[-1])
I have a problem in the lines line. In this line the data are sometimes to huge and i get a kill error.
Is there a short/nice way to rewrite this point?
python arrays memory flush
add a comment |
After solving a sorting of a dataset, I have a problem at this point of my code.
with open(fns_land[xx]) as infile:
lines = infile.readlines()
for line in lines:
result_station.append(line.split(',')[0])
result_date.append(line.split(',')[1])
result_metar.append(line.split(',')[-1])
I have a problem in the lines line. In this line the data are sometimes to huge and i get a kill error.
Is there a short/nice way to rewrite this point?
python arrays memory flush
Possible duplicate of Python readlines() usage and efficient practice for reading
– The Pjot
Nov 14 '18 at 14:23
add a comment |
After solving a sorting of a dataset, I have a problem at this point of my code.
with open(fns_land[xx]) as infile:
lines = infile.readlines()
for line in lines:
result_station.append(line.split(',')[0])
result_date.append(line.split(',')[1])
result_metar.append(line.split(',')[-1])
I have a problem in the lines line. In this line the data are sometimes to huge and i get a kill error.
Is there a short/nice way to rewrite this point?
python arrays memory flush
After solving a sorting of a dataset, I have a problem at this point of my code.
with open(fns_land[xx]) as infile:
lines = infile.readlines()
for line in lines:
result_station.append(line.split(',')[0])
result_date.append(line.split(',')[1])
result_metar.append(line.split(',')[-1])
I have a problem in the lines line. In this line the data are sometimes to huge and i get a kill error.
Is there a short/nice way to rewrite this point?
python arrays memory flush
python arrays memory flush
edited Nov 14 '18 at 15:56
toti08
1,77941623
1,77941623
asked Nov 14 '18 at 14:17
S.KociokS.Kociok
287
287
Possible duplicate of Python readlines() usage and efficient practice for reading
– The Pjot
Nov 14 '18 at 14:23
add a comment |
Possible duplicate of Python readlines() usage and efficient practice for reading
– The Pjot
Nov 14 '18 at 14:23
Possible duplicate of Python readlines() usage and efficient practice for reading
– The Pjot
Nov 14 '18 at 14:23
Possible duplicate of Python readlines() usage and efficient practice for reading
– The Pjot
Nov 14 '18 at 14:23
add a comment |
2 Answers
2
active
oldest
votes
Use readline
instead, this read it one line at a time without loading the entire file into memory.
with open(fns_land[xx]) as infile:
while True:
line = infile.readline()
if not line:
break
result_station.append(line.split(',')[0])
result_date.append(line.split(',')[1])
result_metar.append(line.split(',')[-1])
add a comment |
If you are dealing with a dataset, I would suggest that you have a look at pandas, which I great for dealing with data wrangling.
If your problem is a large dataset, you could load the data in chunks.
import pandas as pd
tfr = pd.read_csv('fns_land{0}.csv'.format(xx), iterator=True, chunksize=1000)
- Line: imported pandas modul
- Line: read data from your csv file in chunks of 1000 lines.
This will be of type pandas.io.parsers.TextFileReader. To load the entire csv file, you follow up with:
df = pd.concat(tfr, ignore_index=True)
The parameter ignore_index=True is added to avoid duplicity of indexes.
You now have all your data loaded into a dataframe. Then do your data manipulation on the columns as vectors, which also is faster than regular line by line.
Have a look here this question that dealt with something similar.
Thanks. But for my using the open methode was the best way. I only want to read in three colums out of 1000 colums. For the next time it is maybe a better way with pandas.
– S.Kociok
Nov 14 '18 at 14:54
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53302313%2fproblem-with-python-memory-flush-csv-size%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
Use readline
instead, this read it one line at a time without loading the entire file into memory.
with open(fns_land[xx]) as infile:
while True:
line = infile.readline()
if not line:
break
result_station.append(line.split(',')[0])
result_date.append(line.split(',')[1])
result_metar.append(line.split(',')[-1])
add a comment |
Use readline
instead, this read it one line at a time without loading the entire file into memory.
with open(fns_land[xx]) as infile:
while True:
line = infile.readline()
if not line:
break
result_station.append(line.split(',')[0])
result_date.append(line.split(',')[1])
result_metar.append(line.split(',')[-1])
add a comment |
Use readline
instead, this read it one line at a time without loading the entire file into memory.
with open(fns_land[xx]) as infile:
while True:
line = infile.readline()
if not line:
break
result_station.append(line.split(',')[0])
result_date.append(line.split(',')[1])
result_metar.append(line.split(',')[-1])
Use readline
instead, this read it one line at a time without loading the entire file into memory.
with open(fns_land[xx]) as infile:
while True:
line = infile.readline()
if not line:
break
result_station.append(line.split(',')[0])
result_date.append(line.split(',')[1])
result_metar.append(line.split(',')[-1])
answered Nov 14 '18 at 14:24
Rocky LiRocky Li
3,1131516
3,1131516
add a comment |
add a comment |
If you are dealing with a dataset, I would suggest that you have a look at pandas, which I great for dealing with data wrangling.
If your problem is a large dataset, you could load the data in chunks.
import pandas as pd
tfr = pd.read_csv('fns_land{0}.csv'.format(xx), iterator=True, chunksize=1000)
- Line: imported pandas modul
- Line: read data from your csv file in chunks of 1000 lines.
This will be of type pandas.io.parsers.TextFileReader. To load the entire csv file, you follow up with:
df = pd.concat(tfr, ignore_index=True)
The parameter ignore_index=True is added to avoid duplicity of indexes.
You now have all your data loaded into a dataframe. Then do your data manipulation on the columns as vectors, which also is faster than regular line by line.
Have a look here this question that dealt with something similar.
Thanks. But for my using the open methode was the best way. I only want to read in three colums out of 1000 colums. For the next time it is maybe a better way with pandas.
– S.Kociok
Nov 14 '18 at 14:54
add a comment |
If you are dealing with a dataset, I would suggest that you have a look at pandas, which I great for dealing with data wrangling.
If your problem is a large dataset, you could load the data in chunks.
import pandas as pd
tfr = pd.read_csv('fns_land{0}.csv'.format(xx), iterator=True, chunksize=1000)
- Line: imported pandas modul
- Line: read data from your csv file in chunks of 1000 lines.
This will be of type pandas.io.parsers.TextFileReader. To load the entire csv file, you follow up with:
df = pd.concat(tfr, ignore_index=True)
The parameter ignore_index=True is added to avoid duplicity of indexes.
You now have all your data loaded into a dataframe. Then do your data manipulation on the columns as vectors, which also is faster than regular line by line.
Have a look here this question that dealt with something similar.
Thanks. But for my using the open methode was the best way. I only want to read in three colums out of 1000 colums. For the next time it is maybe a better way with pandas.
– S.Kociok
Nov 14 '18 at 14:54
add a comment |
If you are dealing with a dataset, I would suggest that you have a look at pandas, which I great for dealing with data wrangling.
If your problem is a large dataset, you could load the data in chunks.
import pandas as pd
tfr = pd.read_csv('fns_land{0}.csv'.format(xx), iterator=True, chunksize=1000)
- Line: imported pandas modul
- Line: read data from your csv file in chunks of 1000 lines.
This will be of type pandas.io.parsers.TextFileReader. To load the entire csv file, you follow up with:
df = pd.concat(tfr, ignore_index=True)
The parameter ignore_index=True is added to avoid duplicity of indexes.
You now have all your data loaded into a dataframe. Then do your data manipulation on the columns as vectors, which also is faster than regular line by line.
Have a look here this question that dealt with something similar.
If you are dealing with a dataset, I would suggest that you have a look at pandas, which I great for dealing with data wrangling.
If your problem is a large dataset, you could load the data in chunks.
import pandas as pd
tfr = pd.read_csv('fns_land{0}.csv'.format(xx), iterator=True, chunksize=1000)
- Line: imported pandas modul
- Line: read data from your csv file in chunks of 1000 lines.
This will be of type pandas.io.parsers.TextFileReader. To load the entire csv file, you follow up with:
df = pd.concat(tfr, ignore_index=True)
The parameter ignore_index=True is added to avoid duplicity of indexes.
You now have all your data loaded into a dataframe. Then do your data manipulation on the columns as vectors, which also is faster than regular line by line.
Have a look here this question that dealt with something similar.
answered Nov 14 '18 at 14:41
PhilipPhilip
341212
341212
Thanks. But for my using the open methode was the best way. I only want to read in three colums out of 1000 colums. For the next time it is maybe a better way with pandas.
– S.Kociok
Nov 14 '18 at 14:54
add a comment |
Thanks. But for my using the open methode was the best way. I only want to read in three colums out of 1000 colums. For the next time it is maybe a better way with pandas.
– S.Kociok
Nov 14 '18 at 14:54
Thanks. But for my using the open methode was the best way. I only want to read in three colums out of 1000 colums. For the next time it is maybe a better way with pandas.
– S.Kociok
Nov 14 '18 at 14:54
Thanks. But for my using the open methode was the best way. I only want to read in three colums out of 1000 colums. For the next time it is maybe a better way with pandas.
– S.Kociok
Nov 14 '18 at 14:54
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53302313%2fproblem-with-python-memory-flush-csv-size%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Possible duplicate of Python readlines() usage and efficient practice for reading
– The Pjot
Nov 14 '18 at 14:23