splitting long string at indices in parallel in python
I have many files that each have several million rows; each row is a dumped data entry and is several hundred characters long. The rows come in groups and the first two characters tell me the type of row it is, and I use that to parse it. This structure prohibits me from loading the rows to a dataframe, for example, or anything else that does not go through the rows one at a time.
For each row, I currently create a dictionary vals = {}, and then sequentially run through about fifty keys along the lines of
vals{'name'} = row[2:24]
vals{'state'} = row[24:26]
Instead of doing fifty assignments sequentially, can I do this simultaneously or in parallel in some simple manner?
Is
vals{'name'},vals{'state'} = row[2:24],row[24:26]
faster if I do this simultaneous assignment for many entries? I could also reformulate this as a list comprehension. Would that be faster than running through sequentially?
python string parsing
add a comment |
I have many files that each have several million rows; each row is a dumped data entry and is several hundred characters long. The rows come in groups and the first two characters tell me the type of row it is, and I use that to parse it. This structure prohibits me from loading the rows to a dataframe, for example, or anything else that does not go through the rows one at a time.
For each row, I currently create a dictionary vals = {}, and then sequentially run through about fifty keys along the lines of
vals{'name'} = row[2:24]
vals{'state'} = row[24:26]
Instead of doing fifty assignments sequentially, can I do this simultaneously or in parallel in some simple manner?
Is
vals{'name'},vals{'state'} = row[2:24],row[24:26]
faster if I do this simultaneous assignment for many entries? I could also reformulate this as a list comprehension. Would that be faster than running through sequentially?
python string parsing
add a comment |
I have many files that each have several million rows; each row is a dumped data entry and is several hundred characters long. The rows come in groups and the first two characters tell me the type of row it is, and I use that to parse it. This structure prohibits me from loading the rows to a dataframe, for example, or anything else that does not go through the rows one at a time.
For each row, I currently create a dictionary vals = {}, and then sequentially run through about fifty keys along the lines of
vals{'name'} = row[2:24]
vals{'state'} = row[24:26]
Instead of doing fifty assignments sequentially, can I do this simultaneously or in parallel in some simple manner?
Is
vals{'name'},vals{'state'} = row[2:24],row[24:26]
faster if I do this simultaneous assignment for many entries? I could also reformulate this as a list comprehension. Would that be faster than running through sequentially?
python string parsing
I have many files that each have several million rows; each row is a dumped data entry and is several hundred characters long. The rows come in groups and the first two characters tell me the type of row it is, and I use that to parse it. This structure prohibits me from loading the rows to a dataframe, for example, or anything else that does not go through the rows one at a time.
For each row, I currently create a dictionary vals = {}, and then sequentially run through about fifty keys along the lines of
vals{'name'} = row[2:24]
vals{'state'} = row[24:26]
Instead of doing fifty assignments sequentially, can I do this simultaneously or in parallel in some simple manner?
Is
vals{'name'},vals{'state'} = row[2:24],row[24:26]
faster if I do this simultaneous assignment for many entries? I could also reformulate this as a list comprehension. Would that be faster than running through sequentially?
python string parsing
python string parsing
edited Nov 14 '18 at 23:33
user3473556
asked Nov 14 '18 at 18:01
user3473556user3473556
2718
2718
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
To answer your question, no, doing multiple assignment will not speed up your program. This is because the multiple assignment syntax is just a different way of writing multiple assignments on different lines.
For example
vals{'name'},vals{'state'} = row[2:24],row[24:26]
is equivalent to
vals{'name'}= row[2:24]
vals{'state'} = row[2:24]
If you want to optimize your code, your should start by profiling it to determine the parts that are taking the largest amount of time. I would also check to ensure that you are not doing multiple reads from the same file, as these are very slow compared to reading from memory. If possible, you should read the entire file into memory first, and then process it.
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53306261%2fsplitting-long-string-at-indices-in-parallel-in-python%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
To answer your question, no, doing multiple assignment will not speed up your program. This is because the multiple assignment syntax is just a different way of writing multiple assignments on different lines.
For example
vals{'name'},vals{'state'} = row[2:24],row[24:26]
is equivalent to
vals{'name'}= row[2:24]
vals{'state'} = row[2:24]
If you want to optimize your code, your should start by profiling it to determine the parts that are taking the largest amount of time. I would also check to ensure that you are not doing multiple reads from the same file, as these are very slow compared to reading from memory. If possible, you should read the entire file into memory first, and then process it.
add a comment |
To answer your question, no, doing multiple assignment will not speed up your program. This is because the multiple assignment syntax is just a different way of writing multiple assignments on different lines.
For example
vals{'name'},vals{'state'} = row[2:24],row[24:26]
is equivalent to
vals{'name'}= row[2:24]
vals{'state'} = row[2:24]
If you want to optimize your code, your should start by profiling it to determine the parts that are taking the largest amount of time. I would also check to ensure that you are not doing multiple reads from the same file, as these are very slow compared to reading from memory. If possible, you should read the entire file into memory first, and then process it.
add a comment |
To answer your question, no, doing multiple assignment will not speed up your program. This is because the multiple assignment syntax is just a different way of writing multiple assignments on different lines.
For example
vals{'name'},vals{'state'} = row[2:24],row[24:26]
is equivalent to
vals{'name'}= row[2:24]
vals{'state'} = row[2:24]
If you want to optimize your code, your should start by profiling it to determine the parts that are taking the largest amount of time. I would also check to ensure that you are not doing multiple reads from the same file, as these are very slow compared to reading from memory. If possible, you should read the entire file into memory first, and then process it.
To answer your question, no, doing multiple assignment will not speed up your program. This is because the multiple assignment syntax is just a different way of writing multiple assignments on different lines.
For example
vals{'name'},vals{'state'} = row[2:24],row[24:26]
is equivalent to
vals{'name'}= row[2:24]
vals{'state'} = row[2:24]
If you want to optimize your code, your should start by profiling it to determine the parts that are taking the largest amount of time. I would also check to ensure that you are not doing multiple reads from the same file, as these are very slow compared to reading from memory. If possible, you should read the entire file into memory first, and then process it.
answered Nov 14 '18 at 19:38
jdrdjdrd
1297
1297
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53306261%2fsplitting-long-string-at-indices-in-parallel-in-python%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown