structuring a large python repository, to not import everything











up vote
4
down vote

favorite
1












I'm having an issue managing imports with a big software repo that we have. For sake of clarity, let's pretend the repo looks something like this:



repo/
__init__.py
utils/
__init__.py
math.py
readers.py
...
...


Now our __init__.py files are setup so that we can do something like this



from repo.utils import IniReader 


In this example repo/utils/__init__.py would have



from .readers import IniReader, DatReader


This structure has worked out well for us from a readability standpoint, but we are now facing issues when trying to deploy applications.



The issue is this... let's pretend I'm writing an app that looks like this:



from repo.utils import IniReader
if __name__ == '__main__':
r = IniReader('blah.ini')
print(r.fields)


Now the from repo.utils import IniReader will execute repo/utils/__init__.py which in this case will import IniReader and DatReader. Let's pretend that DatReader looks something like this:



import numpy as np
import scipy
import tensorflow
from .math import transform

class DatReader():
...


which adheres to PEP8, with all the imports at the top of the file.



The problem here is that DatReader requires some heavyweight imports (e.g. numpy, scipy, tensorflow are huge libraries). To make matters worse, the from .math import transform might have something like from repo.contrib import lookup which then hits the repo/contrib/__init__.py which starts a chain reaction and ends up importing our entire repository.



This really hasn't been a problem for all of us developers with a full development environment stood up, but now that we're trying to ship applications (internally) this import hell is becoming an issue.



Is there a standard solution to this problem? We've talked about just keeping the __init__.py empty, or just not having all the imports at the top of a file as PEP8 states. Both of these solutions come with compromises, so if anyone has suggestions or references, I'd love to hear it.



Thanks!










share|improve this question


























    up vote
    4
    down vote

    favorite
    1












    I'm having an issue managing imports with a big software repo that we have. For sake of clarity, let's pretend the repo looks something like this:



    repo/
    __init__.py
    utils/
    __init__.py
    math.py
    readers.py
    ...
    ...


    Now our __init__.py files are setup so that we can do something like this



    from repo.utils import IniReader 


    In this example repo/utils/__init__.py would have



    from .readers import IniReader, DatReader


    This structure has worked out well for us from a readability standpoint, but we are now facing issues when trying to deploy applications.



    The issue is this... let's pretend I'm writing an app that looks like this:



    from repo.utils import IniReader
    if __name__ == '__main__':
    r = IniReader('blah.ini')
    print(r.fields)


    Now the from repo.utils import IniReader will execute repo/utils/__init__.py which in this case will import IniReader and DatReader. Let's pretend that DatReader looks something like this:



    import numpy as np
    import scipy
    import tensorflow
    from .math import transform

    class DatReader():
    ...


    which adheres to PEP8, with all the imports at the top of the file.



    The problem here is that DatReader requires some heavyweight imports (e.g. numpy, scipy, tensorflow are huge libraries). To make matters worse, the from .math import transform might have something like from repo.contrib import lookup which then hits the repo/contrib/__init__.py which starts a chain reaction and ends up importing our entire repository.



    This really hasn't been a problem for all of us developers with a full development environment stood up, but now that we're trying to ship applications (internally) this import hell is becoming an issue.



    Is there a standard solution to this problem? We've talked about just keeping the __init__.py empty, or just not having all the imports at the top of a file as PEP8 states. Both of these solutions come with compromises, so if anyone has suggestions or references, I'd love to hear it.



    Thanks!










    share|improve this question
























      up vote
      4
      down vote

      favorite
      1









      up vote
      4
      down vote

      favorite
      1






      1





      I'm having an issue managing imports with a big software repo that we have. For sake of clarity, let's pretend the repo looks something like this:



      repo/
      __init__.py
      utils/
      __init__.py
      math.py
      readers.py
      ...
      ...


      Now our __init__.py files are setup so that we can do something like this



      from repo.utils import IniReader 


      In this example repo/utils/__init__.py would have



      from .readers import IniReader, DatReader


      This structure has worked out well for us from a readability standpoint, but we are now facing issues when trying to deploy applications.



      The issue is this... let's pretend I'm writing an app that looks like this:



      from repo.utils import IniReader
      if __name__ == '__main__':
      r = IniReader('blah.ini')
      print(r.fields)


      Now the from repo.utils import IniReader will execute repo/utils/__init__.py which in this case will import IniReader and DatReader. Let's pretend that DatReader looks something like this:



      import numpy as np
      import scipy
      import tensorflow
      from .math import transform

      class DatReader():
      ...


      which adheres to PEP8, with all the imports at the top of the file.



      The problem here is that DatReader requires some heavyweight imports (e.g. numpy, scipy, tensorflow are huge libraries). To make matters worse, the from .math import transform might have something like from repo.contrib import lookup which then hits the repo/contrib/__init__.py which starts a chain reaction and ends up importing our entire repository.



      This really hasn't been a problem for all of us developers with a full development environment stood up, but now that we're trying to ship applications (internally) this import hell is becoming an issue.



      Is there a standard solution to this problem? We've talked about just keeping the __init__.py empty, or just not having all the imports at the top of a file as PEP8 states. Both of these solutions come with compromises, so if anyone has suggestions or references, I'd love to hear it.



      Thanks!










      share|improve this question













      I'm having an issue managing imports with a big software repo that we have. For sake of clarity, let's pretend the repo looks something like this:



      repo/
      __init__.py
      utils/
      __init__.py
      math.py
      readers.py
      ...
      ...


      Now our __init__.py files are setup so that we can do something like this



      from repo.utils import IniReader 


      In this example repo/utils/__init__.py would have



      from .readers import IniReader, DatReader


      This structure has worked out well for us from a readability standpoint, but we are now facing issues when trying to deploy applications.



      The issue is this... let's pretend I'm writing an app that looks like this:



      from repo.utils import IniReader
      if __name__ == '__main__':
      r = IniReader('blah.ini')
      print(r.fields)


      Now the from repo.utils import IniReader will execute repo/utils/__init__.py which in this case will import IniReader and DatReader. Let's pretend that DatReader looks something like this:



      import numpy as np
      import scipy
      import tensorflow
      from .math import transform

      class DatReader():
      ...


      which adheres to PEP8, with all the imports at the top of the file.



      The problem here is that DatReader requires some heavyweight imports (e.g. numpy, scipy, tensorflow are huge libraries). To make matters worse, the from .math import transform might have something like from repo.contrib import lookup which then hits the repo/contrib/__init__.py which starts a chain reaction and ends up importing our entire repository.



      This really hasn't been a problem for all of us developers with a full development environment stood up, but now that we're trying to ship applications (internally) this import hell is becoming an issue.



      Is there a standard solution to this problem? We've talked about just keeping the __init__.py empty, or just not having all the imports at the top of a file as PEP8 states. Both of these solutions come with compromises, so if anyone has suggestions or references, I'd love to hear it.



      Thanks!







      python deployment import






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Nov 10 at 17:10









      matt

      567




      567
























          1 Answer
          1






          active

          oldest

          votes

















          up vote
          1
          down vote



          accepted










          It might be helpful to take a step back for a brief moment and look at the fundamental issue that you seem to be faced with, namely: "How do I deal with missing python packages on users' machines?"



          Basically there are two categories of solutions to this problem:




          1. Help to make the missing packages available on the user's machine.


            • You could distribute your code as a package that users can install with pip. Just include dependency specifications in your distributed package, and pip will offer users to automatically download and install any missing packages.

            • You could freeze your code, i.e. convert your code to a self-standing application that already includes all the required packages.



          2. Divide your package dependencies into mandatory and optional ones, and adapt your code such that the absence of an optional package doesn't cause all of the code to break.


            • As you already noted, you could sanitize the module-level imports (i.e. imports in __init__.py files) such that optional packages are not loaded 'prematurely'. In your case that would mean removing the DatReader imports.

            • As you also already noted, you could move optional package imports inside the classes or functions that need them. Style-wise this is not really optimal, but the code itself will still be perfectly valid. It normally doesn't matter that the import statements will get executed again every time when the function is run, because the actual import will still only take place once.

            • You could wrap the imports of the optional packages into try-except clauses. This will prevent any import errors from occurring (though of course you'll still encounter an error once you try to run a class or function that depends upon the missing package).




          Example of an import in try-except clause:



          import warnings
          try:
          import scipy
          except ImportError:
          warnings.warn("The python package `scipy` could not be imported. As a result "
          "the class `repo.utils.DatReader` will not be functional.")


          Now to come back again to your original question "Is there a standard solution to this problem?": I'd say no. There's no single golden bullet. All solutions come with their own advantages and disadvantages, and you'll have to decide which solution is the optimal one for your specific situation.






          share|improve this answer























            Your Answer






            StackExchange.ifUsing("editor", function () {
            StackExchange.using("externalEditor", function () {
            StackExchange.using("snippets", function () {
            StackExchange.snippets.init();
            });
            });
            }, "code-snippets");

            StackExchange.ready(function() {
            var channelOptions = {
            tags: "".split(" "),
            id: "1"
            };
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function() {
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled) {
            StackExchange.using("snippets", function() {
            createEditor();
            });
            }
            else {
            createEditor();
            }
            });

            function createEditor() {
            StackExchange.prepareEditor({
            heartbeatType: 'answer',
            convertImagesToLinks: true,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: 10,
            bindNavPrevention: true,
            postfix: "",
            imageUploader: {
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            },
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            });


            }
            });














             

            draft saved


            draft discarded


















            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53241381%2fstructuring-a-large-python-repository-to-not-import-everything%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown

























            1 Answer
            1






            active

            oldest

            votes








            1 Answer
            1






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes








            up vote
            1
            down vote



            accepted










            It might be helpful to take a step back for a brief moment and look at the fundamental issue that you seem to be faced with, namely: "How do I deal with missing python packages on users' machines?"



            Basically there are two categories of solutions to this problem:




            1. Help to make the missing packages available on the user's machine.


              • You could distribute your code as a package that users can install with pip. Just include dependency specifications in your distributed package, and pip will offer users to automatically download and install any missing packages.

              • You could freeze your code, i.e. convert your code to a self-standing application that already includes all the required packages.



            2. Divide your package dependencies into mandatory and optional ones, and adapt your code such that the absence of an optional package doesn't cause all of the code to break.


              • As you already noted, you could sanitize the module-level imports (i.e. imports in __init__.py files) such that optional packages are not loaded 'prematurely'. In your case that would mean removing the DatReader imports.

              • As you also already noted, you could move optional package imports inside the classes or functions that need them. Style-wise this is not really optimal, but the code itself will still be perfectly valid. It normally doesn't matter that the import statements will get executed again every time when the function is run, because the actual import will still only take place once.

              • You could wrap the imports of the optional packages into try-except clauses. This will prevent any import errors from occurring (though of course you'll still encounter an error once you try to run a class or function that depends upon the missing package).




            Example of an import in try-except clause:



            import warnings
            try:
            import scipy
            except ImportError:
            warnings.warn("The python package `scipy` could not be imported. As a result "
            "the class `repo.utils.DatReader` will not be functional.")


            Now to come back again to your original question "Is there a standard solution to this problem?": I'd say no. There's no single golden bullet. All solutions come with their own advantages and disadvantages, and you'll have to decide which solution is the optimal one for your specific situation.






            share|improve this answer



























              up vote
              1
              down vote



              accepted










              It might be helpful to take a step back for a brief moment and look at the fundamental issue that you seem to be faced with, namely: "How do I deal with missing python packages on users' machines?"



              Basically there are two categories of solutions to this problem:




              1. Help to make the missing packages available on the user's machine.


                • You could distribute your code as a package that users can install with pip. Just include dependency specifications in your distributed package, and pip will offer users to automatically download and install any missing packages.

                • You could freeze your code, i.e. convert your code to a self-standing application that already includes all the required packages.



              2. Divide your package dependencies into mandatory and optional ones, and adapt your code such that the absence of an optional package doesn't cause all of the code to break.


                • As you already noted, you could sanitize the module-level imports (i.e. imports in __init__.py files) such that optional packages are not loaded 'prematurely'. In your case that would mean removing the DatReader imports.

                • As you also already noted, you could move optional package imports inside the classes or functions that need them. Style-wise this is not really optimal, but the code itself will still be perfectly valid. It normally doesn't matter that the import statements will get executed again every time when the function is run, because the actual import will still only take place once.

                • You could wrap the imports of the optional packages into try-except clauses. This will prevent any import errors from occurring (though of course you'll still encounter an error once you try to run a class or function that depends upon the missing package).




              Example of an import in try-except clause:



              import warnings
              try:
              import scipy
              except ImportError:
              warnings.warn("The python package `scipy` could not be imported. As a result "
              "the class `repo.utils.DatReader` will not be functional.")


              Now to come back again to your original question "Is there a standard solution to this problem?": I'd say no. There's no single golden bullet. All solutions come with their own advantages and disadvantages, and you'll have to decide which solution is the optimal one for your specific situation.






              share|improve this answer

























                up vote
                1
                down vote



                accepted







                up vote
                1
                down vote



                accepted






                It might be helpful to take a step back for a brief moment and look at the fundamental issue that you seem to be faced with, namely: "How do I deal with missing python packages on users' machines?"



                Basically there are two categories of solutions to this problem:




                1. Help to make the missing packages available on the user's machine.


                  • You could distribute your code as a package that users can install with pip. Just include dependency specifications in your distributed package, and pip will offer users to automatically download and install any missing packages.

                  • You could freeze your code, i.e. convert your code to a self-standing application that already includes all the required packages.



                2. Divide your package dependencies into mandatory and optional ones, and adapt your code such that the absence of an optional package doesn't cause all of the code to break.


                  • As you already noted, you could sanitize the module-level imports (i.e. imports in __init__.py files) such that optional packages are not loaded 'prematurely'. In your case that would mean removing the DatReader imports.

                  • As you also already noted, you could move optional package imports inside the classes or functions that need them. Style-wise this is not really optimal, but the code itself will still be perfectly valid. It normally doesn't matter that the import statements will get executed again every time when the function is run, because the actual import will still only take place once.

                  • You could wrap the imports of the optional packages into try-except clauses. This will prevent any import errors from occurring (though of course you'll still encounter an error once you try to run a class or function that depends upon the missing package).




                Example of an import in try-except clause:



                import warnings
                try:
                import scipy
                except ImportError:
                warnings.warn("The python package `scipy` could not be imported. As a result "
                "the class `repo.utils.DatReader` will not be functional.")


                Now to come back again to your original question "Is there a standard solution to this problem?": I'd say no. There's no single golden bullet. All solutions come with their own advantages and disadvantages, and you'll have to decide which solution is the optimal one for your specific situation.






                share|improve this answer














                It might be helpful to take a step back for a brief moment and look at the fundamental issue that you seem to be faced with, namely: "How do I deal with missing python packages on users' machines?"



                Basically there are two categories of solutions to this problem:




                1. Help to make the missing packages available on the user's machine.


                  • You could distribute your code as a package that users can install with pip. Just include dependency specifications in your distributed package, and pip will offer users to automatically download and install any missing packages.

                  • You could freeze your code, i.e. convert your code to a self-standing application that already includes all the required packages.



                2. Divide your package dependencies into mandatory and optional ones, and adapt your code such that the absence of an optional package doesn't cause all of the code to break.


                  • As you already noted, you could sanitize the module-level imports (i.e. imports in __init__.py files) such that optional packages are not loaded 'prematurely'. In your case that would mean removing the DatReader imports.

                  • As you also already noted, you could move optional package imports inside the classes or functions that need them. Style-wise this is not really optimal, but the code itself will still be perfectly valid. It normally doesn't matter that the import statements will get executed again every time when the function is run, because the actual import will still only take place once.

                  • You could wrap the imports of the optional packages into try-except clauses. This will prevent any import errors from occurring (though of course you'll still encounter an error once you try to run a class or function that depends upon the missing package).




                Example of an import in try-except clause:



                import warnings
                try:
                import scipy
                except ImportError:
                warnings.warn("The python package `scipy` could not be imported. As a result "
                "the class `repo.utils.DatReader` will not be functional.")


                Now to come back again to your original question "Is there a standard solution to this problem?": I'd say no. There's no single golden bullet. All solutions come with their own advantages and disadvantages, and you'll have to decide which solution is the optimal one for your specific situation.







                share|improve this answer














                share|improve this answer



                share|improve this answer








                edited Nov 10 at 23:43

























                answered Nov 10 at 23:36









                Xukrao

                1,5961521




                1,5961521






























                     

                    draft saved


                    draft discarded



















































                     


                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function () {
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53241381%2fstructuring-a-large-python-repository-to-not-import-everything%23new-answer', 'question_page');
                    }
                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown







                    Popular posts from this blog

                    Florida Star v. B. J. F.

                    Error while running script in elastic search , gateway timeout

                    Adding quotations to stringified JSON object values