Asynchronous inference with a keras model and prefetch_to_device












0















I have build and trained a model with the Keras API. Now I need to create an efficient framework to run inference for a large number of input samples. The tricky part is that not all of those samples are available from the beginning on, but are selected during inference based on the results of previous samples.



I can create a basic pipeline for that, with one process selecting samples to push into a queue and a second process to retrieve and preprocess them. Those samples then get feed to a model on GPU (which only got initialized once in the beginning) with Keras' model.predict_on_batch(batch).

However, this would be quite slow. I'd rather have a small queue on GPU so that there are no loading times when the next batch is transferred from RAM to GPU.



This seems to be possible with the Tensorflow Dataset API and prefetch_to_device [1]. But it also seems to be not straight forward to use the Dataset API with a Keras model for inference:




  1. Inference with tf.data.Dataset has been asked multiple times: [2], [3] but the answers aren't really explanatory beside the given code snippets. Also I'm not sure how to utilize the suggestions for my Keras model checkpoint .hdf5-file.

  2. How to asynchronously feed the Dataset without rebuilding or reloading the graph each time the first process selects new samples? [4], [5]


I'm not to familiar with plain tensorflow code without the keras abstractions so I might have overseen something obvious in the references. I'd be very grateful for detailed explanation or pointers to more sources.










share|improve this question





























    0















    I have build and trained a model with the Keras API. Now I need to create an efficient framework to run inference for a large number of input samples. The tricky part is that not all of those samples are available from the beginning on, but are selected during inference based on the results of previous samples.



    I can create a basic pipeline for that, with one process selecting samples to push into a queue and a second process to retrieve and preprocess them. Those samples then get feed to a model on GPU (which only got initialized once in the beginning) with Keras' model.predict_on_batch(batch).

    However, this would be quite slow. I'd rather have a small queue on GPU so that there are no loading times when the next batch is transferred from RAM to GPU.



    This seems to be possible with the Tensorflow Dataset API and prefetch_to_device [1]. But it also seems to be not straight forward to use the Dataset API with a Keras model for inference:




    1. Inference with tf.data.Dataset has been asked multiple times: [2], [3] but the answers aren't really explanatory beside the given code snippets. Also I'm not sure how to utilize the suggestions for my Keras model checkpoint .hdf5-file.

    2. How to asynchronously feed the Dataset without rebuilding or reloading the graph each time the first process selects new samples? [4], [5]


    I'm not to familiar with plain tensorflow code without the keras abstractions so I might have overseen something obvious in the references. I'd be very grateful for detailed explanation or pointers to more sources.










    share|improve this question



























      0












      0








      0








      I have build and trained a model with the Keras API. Now I need to create an efficient framework to run inference for a large number of input samples. The tricky part is that not all of those samples are available from the beginning on, but are selected during inference based on the results of previous samples.



      I can create a basic pipeline for that, with one process selecting samples to push into a queue and a second process to retrieve and preprocess them. Those samples then get feed to a model on GPU (which only got initialized once in the beginning) with Keras' model.predict_on_batch(batch).

      However, this would be quite slow. I'd rather have a small queue on GPU so that there are no loading times when the next batch is transferred from RAM to GPU.



      This seems to be possible with the Tensorflow Dataset API and prefetch_to_device [1]. But it also seems to be not straight forward to use the Dataset API with a Keras model for inference:




      1. Inference with tf.data.Dataset has been asked multiple times: [2], [3] but the answers aren't really explanatory beside the given code snippets. Also I'm not sure how to utilize the suggestions for my Keras model checkpoint .hdf5-file.

      2. How to asynchronously feed the Dataset without rebuilding or reloading the graph each time the first process selects new samples? [4], [5]


      I'm not to familiar with plain tensorflow code without the keras abstractions so I might have overseen something obvious in the references. I'd be very grateful for detailed explanation or pointers to more sources.










      share|improve this question
















      I have build and trained a model with the Keras API. Now I need to create an efficient framework to run inference for a large number of input samples. The tricky part is that not all of those samples are available from the beginning on, but are selected during inference based on the results of previous samples.



      I can create a basic pipeline for that, with one process selecting samples to push into a queue and a second process to retrieve and preprocess them. Those samples then get feed to a model on GPU (which only got initialized once in the beginning) with Keras' model.predict_on_batch(batch).

      However, this would be quite slow. I'd rather have a small queue on GPU so that there are no loading times when the next batch is transferred from RAM to GPU.



      This seems to be possible with the Tensorflow Dataset API and prefetch_to_device [1]. But it also seems to be not straight forward to use the Dataset API with a Keras model for inference:




      1. Inference with tf.data.Dataset has been asked multiple times: [2], [3] but the answers aren't really explanatory beside the given code snippets. Also I'm not sure how to utilize the suggestions for my Keras model checkpoint .hdf5-file.

      2. How to asynchronously feed the Dataset without rebuilding or reloading the graph each time the first process selects new samples? [4], [5]


      I'm not to familiar with plain tensorflow code without the keras abstractions so I might have overseen something obvious in the references. I'd be very grateful for detailed explanation or pointers to more sources.







      asynchronous keras gpu tensorflow-datasets inference






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Nov 16 '18 at 9:12







      Johnny TGun

















      asked Nov 16 '18 at 8:18









      Johnny TGunJohnny TGun

      114




      114
























          0






          active

          oldest

          votes












          Your Answer






          StackExchange.ifUsing("editor", function () {
          StackExchange.using("externalEditor", function () {
          StackExchange.using("snippets", function () {
          StackExchange.snippets.init();
          });
          });
          }, "code-snippets");

          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "1"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });














          draft saved

          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53333897%2fasynchronous-inference-with-a-keras-model-and-prefetch-to-device%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          0






          active

          oldest

          votes








          0






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes
















          draft saved

          draft discarded




















































          Thanks for contributing an answer to Stack Overflow!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53333897%2fasynchronous-inference-with-a-keras-model-and-prefetch-to-device%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          Florida Star v. B. J. F.

          Danny Elfman

          Retrieve a Users Dashboard in Tumblr with R and TumblR. Oauth Issues