Downloading public data directory from google cloud storage with command line utilities like wget












1














I would like to download publicly available data from google cloud storage. However, because I need to be in a Python3.x environment, it is not possible to use gsutil. I can download individual files with wget as



wget http://storage.googleapis.com/path-to-file/output_filename -O output_filename



However, commands like



wget -r --no-parent https://console.cloud.google.com/path_to_directory/output_directoryname -O output_directoryname



do not seem to work as they just download an index file for the directory. Neither do rsync or curl attempts based on some initial attempts. Any idea of how to download publicly available data on google cloud storage as a directory?










share|improve this question



























    1














    I would like to download publicly available data from google cloud storage. However, because I need to be in a Python3.x environment, it is not possible to use gsutil. I can download individual files with wget as



    wget http://storage.googleapis.com/path-to-file/output_filename -O output_filename



    However, commands like



    wget -r --no-parent https://console.cloud.google.com/path_to_directory/output_directoryname -O output_directoryname



    do not seem to work as they just download an index file for the directory. Neither do rsync or curl attempts based on some initial attempts. Any idea of how to download publicly available data on google cloud storage as a directory?










    share|improve this question

























      1












      1








      1







      I would like to download publicly available data from google cloud storage. However, because I need to be in a Python3.x environment, it is not possible to use gsutil. I can download individual files with wget as



      wget http://storage.googleapis.com/path-to-file/output_filename -O output_filename



      However, commands like



      wget -r --no-parent https://console.cloud.google.com/path_to_directory/output_directoryname -O output_directoryname



      do not seem to work as they just download an index file for the directory. Neither do rsync or curl attempts based on some initial attempts. Any idea of how to download publicly available data on google cloud storage as a directory?










      share|improve this question













      I would like to download publicly available data from google cloud storage. However, because I need to be in a Python3.x environment, it is not possible to use gsutil. I can download individual files with wget as



      wget http://storage.googleapis.com/path-to-file/output_filename -O output_filename



      However, commands like



      wget -r --no-parent https://console.cloud.google.com/path_to_directory/output_directoryname -O output_directoryname



      do not seem to work as they just download an index file for the directory. Neither do rsync or curl attempts based on some initial attempts. Any idea of how to download publicly available data on google cloud storage as a directory?







      google-cloud-storage wget gsutil






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Nov 12 '18 at 15:31









      dylkotdylkot

      5182718




      5182718
























          1 Answer
          1






          active

          oldest

          votes


















          1














          The approach you mentioned above does not work because Google Cloud Storage doesn't have real "directories". As an example, "path/to/some/files/file.txt" is the entire name of that object. A similarly named object, "path/to/some/files/file2.txt", just happens to share the same naming prefix.



          As for how you could fetch these files: The GCS APIs (both XML and JSON) allow you to do an object listing against the parent bucket, specifying a prefix; in this case, you'd want all objects starting with the prefix "path/to/some/files/". You could then make individual HTTP requests for each of the objects specified in the response body. That being said, you'd probably find this much easier to do via one of the GCS client libraries, such as the Python library.



          Also, gsutil currently has a GitHub issue open to track adding support for Python 3.






          share|improve this answer





















          • Thanks! I'll try out using the Python library to get the complete list of files in the directory and then download them one at a time. Hopefully gsutil will eventually add support for Python 3...
            – dylkot
            Nov 14 '18 at 5:45











          Your Answer






          StackExchange.ifUsing("editor", function () {
          StackExchange.using("externalEditor", function () {
          StackExchange.using("snippets", function () {
          StackExchange.snippets.init();
          });
          });
          }, "code-snippets");

          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "1"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });














          draft saved

          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53265331%2fdownloading-public-data-directory-from-google-cloud-storage-with-command-line-ut%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          1














          The approach you mentioned above does not work because Google Cloud Storage doesn't have real "directories". As an example, "path/to/some/files/file.txt" is the entire name of that object. A similarly named object, "path/to/some/files/file2.txt", just happens to share the same naming prefix.



          As for how you could fetch these files: The GCS APIs (both XML and JSON) allow you to do an object listing against the parent bucket, specifying a prefix; in this case, you'd want all objects starting with the prefix "path/to/some/files/". You could then make individual HTTP requests for each of the objects specified in the response body. That being said, you'd probably find this much easier to do via one of the GCS client libraries, such as the Python library.



          Also, gsutil currently has a GitHub issue open to track adding support for Python 3.






          share|improve this answer





















          • Thanks! I'll try out using the Python library to get the complete list of files in the directory and then download them one at a time. Hopefully gsutil will eventually add support for Python 3...
            – dylkot
            Nov 14 '18 at 5:45
















          1














          The approach you mentioned above does not work because Google Cloud Storage doesn't have real "directories". As an example, "path/to/some/files/file.txt" is the entire name of that object. A similarly named object, "path/to/some/files/file2.txt", just happens to share the same naming prefix.



          As for how you could fetch these files: The GCS APIs (both XML and JSON) allow you to do an object listing against the parent bucket, specifying a prefix; in this case, you'd want all objects starting with the prefix "path/to/some/files/". You could then make individual HTTP requests for each of the objects specified in the response body. That being said, you'd probably find this much easier to do via one of the GCS client libraries, such as the Python library.



          Also, gsutil currently has a GitHub issue open to track adding support for Python 3.






          share|improve this answer





















          • Thanks! I'll try out using the Python library to get the complete list of files in the directory and then download them one at a time. Hopefully gsutil will eventually add support for Python 3...
            – dylkot
            Nov 14 '18 at 5:45














          1












          1








          1






          The approach you mentioned above does not work because Google Cloud Storage doesn't have real "directories". As an example, "path/to/some/files/file.txt" is the entire name of that object. A similarly named object, "path/to/some/files/file2.txt", just happens to share the same naming prefix.



          As for how you could fetch these files: The GCS APIs (both XML and JSON) allow you to do an object listing against the parent bucket, specifying a prefix; in this case, you'd want all objects starting with the prefix "path/to/some/files/". You could then make individual HTTP requests for each of the objects specified in the response body. That being said, you'd probably find this much easier to do via one of the GCS client libraries, such as the Python library.



          Also, gsutil currently has a GitHub issue open to track adding support for Python 3.






          share|improve this answer












          The approach you mentioned above does not work because Google Cloud Storage doesn't have real "directories". As an example, "path/to/some/files/file.txt" is the entire name of that object. A similarly named object, "path/to/some/files/file2.txt", just happens to share the same naming prefix.



          As for how you could fetch these files: The GCS APIs (both XML and JSON) allow you to do an object listing against the parent bucket, specifying a prefix; in this case, you'd want all objects starting with the prefix "path/to/some/files/". You could then make individual HTTP requests for each of the objects specified in the response body. That being said, you'd probably find this much easier to do via one of the GCS client libraries, such as the Python library.



          Also, gsutil currently has a GitHub issue open to track adding support for Python 3.







          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Nov 12 '18 at 21:11









          mhouglummhouglum

          976515




          976515












          • Thanks! I'll try out using the Python library to get the complete list of files in the directory and then download them one at a time. Hopefully gsutil will eventually add support for Python 3...
            – dylkot
            Nov 14 '18 at 5:45


















          • Thanks! I'll try out using the Python library to get the complete list of files in the directory and then download them one at a time. Hopefully gsutil will eventually add support for Python 3...
            – dylkot
            Nov 14 '18 at 5:45
















          Thanks! I'll try out using the Python library to get the complete list of files in the directory and then download them one at a time. Hopefully gsutil will eventually add support for Python 3...
          – dylkot
          Nov 14 '18 at 5:45




          Thanks! I'll try out using the Python library to get the complete list of files in the directory and then download them one at a time. Hopefully gsutil will eventually add support for Python 3...
          – dylkot
          Nov 14 '18 at 5:45


















          draft saved

          draft discarded




















































          Thanks for contributing an answer to Stack Overflow!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          To learn more, see our tips on writing great answers.





          Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


          Please pay close attention to the following guidance:


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53265331%2fdownloading-public-data-directory-from-google-cloud-storage-with-command-line-ut%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          Florida Star v. B. J. F.

          Error while running script in elastic search , gateway timeout

          Adding quotations to stringified JSON object values