splitting long string at indices in parallel in python












0















I have many files that each have several million rows; each row is a dumped data entry and is several hundred characters long. The rows come in groups and the first two characters tell me the type of row it is, and I use that to parse it. This structure prohibits me from loading the rows to a dataframe, for example, or anything else that does not go through the rows one at a time.



For each row, I currently create a dictionary vals = {}, and then sequentially run through about fifty keys along the lines of



vals{'name'} = row[2:24]



vals{'state'} = row[24:26]



Instead of doing fifty assignments sequentially, can I do this simultaneously or in parallel in some simple manner?



Is



vals{'name'},vals{'state'} = row[2:24],row[24:26]



faster if I do this simultaneous assignment for many entries? I could also reformulate this as a list comprehension. Would that be faster than running through sequentially?










share|improve this question





























    0















    I have many files that each have several million rows; each row is a dumped data entry and is several hundred characters long. The rows come in groups and the first two characters tell me the type of row it is, and I use that to parse it. This structure prohibits me from loading the rows to a dataframe, for example, or anything else that does not go through the rows one at a time.



    For each row, I currently create a dictionary vals = {}, and then sequentially run through about fifty keys along the lines of



    vals{'name'} = row[2:24]



    vals{'state'} = row[24:26]



    Instead of doing fifty assignments sequentially, can I do this simultaneously or in parallel in some simple manner?



    Is



    vals{'name'},vals{'state'} = row[2:24],row[24:26]



    faster if I do this simultaneous assignment for many entries? I could also reformulate this as a list comprehension. Would that be faster than running through sequentially?










    share|improve this question



























      0












      0








      0








      I have many files that each have several million rows; each row is a dumped data entry and is several hundred characters long. The rows come in groups and the first two characters tell me the type of row it is, and I use that to parse it. This structure prohibits me from loading the rows to a dataframe, for example, or anything else that does not go through the rows one at a time.



      For each row, I currently create a dictionary vals = {}, and then sequentially run through about fifty keys along the lines of



      vals{'name'} = row[2:24]



      vals{'state'} = row[24:26]



      Instead of doing fifty assignments sequentially, can I do this simultaneously or in parallel in some simple manner?



      Is



      vals{'name'},vals{'state'} = row[2:24],row[24:26]



      faster if I do this simultaneous assignment for many entries? I could also reformulate this as a list comprehension. Would that be faster than running through sequentially?










      share|improve this question
















      I have many files that each have several million rows; each row is a dumped data entry and is several hundred characters long. The rows come in groups and the first two characters tell me the type of row it is, and I use that to parse it. This structure prohibits me from loading the rows to a dataframe, for example, or anything else that does not go through the rows one at a time.



      For each row, I currently create a dictionary vals = {}, and then sequentially run through about fifty keys along the lines of



      vals{'name'} = row[2:24]



      vals{'state'} = row[24:26]



      Instead of doing fifty assignments sequentially, can I do this simultaneously or in parallel in some simple manner?



      Is



      vals{'name'},vals{'state'} = row[2:24],row[24:26]



      faster if I do this simultaneous assignment for many entries? I could also reformulate this as a list comprehension. Would that be faster than running through sequentially?







      python string parsing






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Nov 14 '18 at 23:33







      user3473556

















      asked Nov 14 '18 at 18:01









      user3473556user3473556

      2718




      2718
























          1 Answer
          1






          active

          oldest

          votes


















          1














          To answer your question, no, doing multiple assignment will not speed up your program. This is because the multiple assignment syntax is just a different way of writing multiple assignments on different lines.



          For example



          vals{'name'},vals{'state'} = row[2:24],row[24:26]


          is equivalent to



          vals{'name'}= row[2:24]
          vals{'state'} = row[2:24]


          If you want to optimize your code, your should start by profiling it to determine the parts that are taking the largest amount of time. I would also check to ensure that you are not doing multiple reads from the same file, as these are very slow compared to reading from memory. If possible, you should read the entire file into memory first, and then process it.






          share|improve this answer























            Your Answer






            StackExchange.ifUsing("editor", function () {
            StackExchange.using("externalEditor", function () {
            StackExchange.using("snippets", function () {
            StackExchange.snippets.init();
            });
            });
            }, "code-snippets");

            StackExchange.ready(function() {
            var channelOptions = {
            tags: "".split(" "),
            id: "1"
            };
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function() {
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled) {
            StackExchange.using("snippets", function() {
            createEditor();
            });
            }
            else {
            createEditor();
            }
            });

            function createEditor() {
            StackExchange.prepareEditor({
            heartbeatType: 'answer',
            autoActivateHeartbeat: false,
            convertImagesToLinks: true,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: 10,
            bindNavPrevention: true,
            postfix: "",
            imageUploader: {
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            },
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            });


            }
            });














            draft saved

            draft discarded


















            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53306261%2fsplitting-long-string-at-indices-in-parallel-in-python%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown

























            1 Answer
            1






            active

            oldest

            votes








            1 Answer
            1






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes









            1














            To answer your question, no, doing multiple assignment will not speed up your program. This is because the multiple assignment syntax is just a different way of writing multiple assignments on different lines.



            For example



            vals{'name'},vals{'state'} = row[2:24],row[24:26]


            is equivalent to



            vals{'name'}= row[2:24]
            vals{'state'} = row[2:24]


            If you want to optimize your code, your should start by profiling it to determine the parts that are taking the largest amount of time. I would also check to ensure that you are not doing multiple reads from the same file, as these are very slow compared to reading from memory. If possible, you should read the entire file into memory first, and then process it.






            share|improve this answer




























              1














              To answer your question, no, doing multiple assignment will not speed up your program. This is because the multiple assignment syntax is just a different way of writing multiple assignments on different lines.



              For example



              vals{'name'},vals{'state'} = row[2:24],row[24:26]


              is equivalent to



              vals{'name'}= row[2:24]
              vals{'state'} = row[2:24]


              If you want to optimize your code, your should start by profiling it to determine the parts that are taking the largest amount of time. I would also check to ensure that you are not doing multiple reads from the same file, as these are very slow compared to reading from memory. If possible, you should read the entire file into memory first, and then process it.






              share|improve this answer


























                1












                1








                1







                To answer your question, no, doing multiple assignment will not speed up your program. This is because the multiple assignment syntax is just a different way of writing multiple assignments on different lines.



                For example



                vals{'name'},vals{'state'} = row[2:24],row[24:26]


                is equivalent to



                vals{'name'}= row[2:24]
                vals{'state'} = row[2:24]


                If you want to optimize your code, your should start by profiling it to determine the parts that are taking the largest amount of time. I would also check to ensure that you are not doing multiple reads from the same file, as these are very slow compared to reading from memory. If possible, you should read the entire file into memory first, and then process it.






                share|improve this answer













                To answer your question, no, doing multiple assignment will not speed up your program. This is because the multiple assignment syntax is just a different way of writing multiple assignments on different lines.



                For example



                vals{'name'},vals{'state'} = row[2:24],row[24:26]


                is equivalent to



                vals{'name'}= row[2:24]
                vals{'state'} = row[2:24]


                If you want to optimize your code, your should start by profiling it to determine the parts that are taking the largest amount of time. I would also check to ensure that you are not doing multiple reads from the same file, as these are very slow compared to reading from memory. If possible, you should read the entire file into memory first, and then process it.







                share|improve this answer












                share|improve this answer



                share|improve this answer










                answered Nov 14 '18 at 19:38









                jdrdjdrd

                1297




                1297
































                    draft saved

                    draft discarded




















































                    Thanks for contributing an answer to Stack Overflow!


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid



                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.


                    To learn more, see our tips on writing great answers.




                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function () {
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53306261%2fsplitting-long-string-at-indices-in-parallel-in-python%23new-answer', 'question_page');
                    }
                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown







                    Popular posts from this blog

                    Florida Star v. B. J. F.

                    Danny Elfman

                    Lugert, Oklahoma