Slow numpy array indexing for keras time series generator












2














I use the keras time series generator for training a neural network with LSTM cells, which unfortunately proved to be a bottleneck in training.



Below is a simplified example to run, which shows the high runtime of the batch generator. It is important to note that the rows from the dataset are chosen randomly and thus a sliding window is not possible. During the training the CPUs are running continuously at about 80%, whereas the GPU is running at a single-digit percentage rate.



def get_time_series(data, index, look_back, batch_size):
samples1 = np.empty((batch_size, look_back, np.size(data, axis=1)))
rows = np.random.randint(look_back, np.size(data, axis=1), size=batch_size)
for j, row in enumerate(rows):
indices = range(rows[j] - look_back, rows[j], 1)
samples1[j] = data[indices]
return samples1


data = np.random.rand(100000, 20)
start = time.time()
batch = get_time_series(data, index=50, look_back=1000, batch_size=2**12)
print("Batch generator needs", time.time()-start, "seconds")


Result:



Batch generator needs 0.6224319934844971 seconds


I already tried to build the 3-d array first, so I only have to index the array-rows in the *get_time_series-*Function. This was about 60 times faster during the training, but leads to an "out of memory error" with large datasets.



Does anyone have ideas on how to improve the performance of this bottleneck? Work with pointer, faster indexing methods, ...



Thanks,
Max










share|improve this question





























    2














    I use the keras time series generator for training a neural network with LSTM cells, which unfortunately proved to be a bottleneck in training.



    Below is a simplified example to run, which shows the high runtime of the batch generator. It is important to note that the rows from the dataset are chosen randomly and thus a sliding window is not possible. During the training the CPUs are running continuously at about 80%, whereas the GPU is running at a single-digit percentage rate.



    def get_time_series(data, index, look_back, batch_size):
    samples1 = np.empty((batch_size, look_back, np.size(data, axis=1)))
    rows = np.random.randint(look_back, np.size(data, axis=1), size=batch_size)
    for j, row in enumerate(rows):
    indices = range(rows[j] - look_back, rows[j], 1)
    samples1[j] = data[indices]
    return samples1


    data = np.random.rand(100000, 20)
    start = time.time()
    batch = get_time_series(data, index=50, look_back=1000, batch_size=2**12)
    print("Batch generator needs", time.time()-start, "seconds")


    Result:



    Batch generator needs 0.6224319934844971 seconds


    I already tried to build the 3-d array first, so I only have to index the array-rows in the *get_time_series-*Function. This was about 60 times faster during the training, but leads to an "out of memory error" with large datasets.



    Does anyone have ideas on how to improve the performance of this bottleneck? Work with pointer, faster indexing methods, ...



    Thanks,
    Max










    share|improve this question



























      2












      2








      2







      I use the keras time series generator for training a neural network with LSTM cells, which unfortunately proved to be a bottleneck in training.



      Below is a simplified example to run, which shows the high runtime of the batch generator. It is important to note that the rows from the dataset are chosen randomly and thus a sliding window is not possible. During the training the CPUs are running continuously at about 80%, whereas the GPU is running at a single-digit percentage rate.



      def get_time_series(data, index, look_back, batch_size):
      samples1 = np.empty((batch_size, look_back, np.size(data, axis=1)))
      rows = np.random.randint(look_back, np.size(data, axis=1), size=batch_size)
      for j, row in enumerate(rows):
      indices = range(rows[j] - look_back, rows[j], 1)
      samples1[j] = data[indices]
      return samples1


      data = np.random.rand(100000, 20)
      start = time.time()
      batch = get_time_series(data, index=50, look_back=1000, batch_size=2**12)
      print("Batch generator needs", time.time()-start, "seconds")


      Result:



      Batch generator needs 0.6224319934844971 seconds


      I already tried to build the 3-d array first, so I only have to index the array-rows in the *get_time_series-*Function. This was about 60 times faster during the training, but leads to an "out of memory error" with large datasets.



      Does anyone have ideas on how to improve the performance of this bottleneck? Work with pointer, faster indexing methods, ...



      Thanks,
      Max










      share|improve this question















      I use the keras time series generator for training a neural network with LSTM cells, which unfortunately proved to be a bottleneck in training.



      Below is a simplified example to run, which shows the high runtime of the batch generator. It is important to note that the rows from the dataset are chosen randomly and thus a sliding window is not possible. During the training the CPUs are running continuously at about 80%, whereas the GPU is running at a single-digit percentage rate.



      def get_time_series(data, index, look_back, batch_size):
      samples1 = np.empty((batch_size, look_back, np.size(data, axis=1)))
      rows = np.random.randint(look_back, np.size(data, axis=1), size=batch_size)
      for j, row in enumerate(rows):
      indices = range(rows[j] - look_back, rows[j], 1)
      samples1[j] = data[indices]
      return samples1


      data = np.random.rand(100000, 20)
      start = time.time()
      batch = get_time_series(data, index=50, look_back=1000, batch_size=2**12)
      print("Batch generator needs", time.time()-start, "seconds")


      Result:



      Batch generator needs 0.6224319934844971 seconds


      I already tried to build the 3-d array first, so I only have to index the array-rows in the *get_time_series-*Function. This was about 60 times faster during the training, but leads to an "out of memory error" with large datasets.



      Does anyone have ideas on how to improve the performance of this bottleneck? Work with pointer, faster indexing methods, ...



      Thanks,
      Max







      performance numpy tensorflow keras generator






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Nov 12 at 12:33

























      asked Nov 12 at 11:07









      Schaefma3

      112




      112
























          1 Answer
          1






          active

          oldest

          votes


















          0














          EDIT 2:



          Not sure if this is going to be any faster, but you can also just do something like this. It still relies on advanced indexing, although over contiguous data, so maybe it's a bit better?:



          import numpy as np

          def get_time_series(data, indices, look_back):
          # Make sure indices are big enough
          indices = indices[indices >= look_back]
          # Make indexing matrix
          idx = indices[:, np.newaxis] + np.arange(-look_back, 0)
          # Make batch
          return data[idx]


          You would use it for example like this:



          import numpy as np

          def get_time_series(data, indices, look_back):
          indices = indices[indices >= look_back]
          idx = indices[:, np.newaxis] + np.arange(-look_back, 0)
          return data[idx]

          def make_batches(data, look_back, batch_size):
          indices = np.random.permutation(np.arange(look_back, len(data) + 1))
          for i in range(0, len(indices), batch_size):
          yield get_time_series(data, indices[i:i + batch_size], look_back)

          data = ...
          look_back = ...
          batch_size = ...
          for batch in make_batches(data, look_back, batch_size):
          # Use batch




          EDIT:



          If you want to shuffle the examples, you could first make the sliding window for the whole dataset (which should not take any memory or time) and then take batches from a shuffled index:



          # Make sliding window with the previous function
          data_sw = get_time_series(data, 0, look_back, len(data))
          # Random index
          batch_idx = np.random.permutation(len(data_sw))
          # To get the first batch
          batch = data_sw[batch_idx[:batch_size]]




          I think this does what you want, and should be quite faster than using loops:



          import numpy as np

          def get_time_series(data, index, look_back, batch_size):
          from numpy.lib.stride_tricks import as_strided
          # Index should be at least as big as look_back to have enough elements before it
          index = max(index, look_back)
          # Batch size should not go beyond the array
          batch_size = min(batch_size, len(data) - index + 1)
          # Relevant slice for the batch
          data_slice = data[index - look_back:index + batch_size]
          # Reshape with stride tricks as a "sliding window"
          data_strides = data_slice.strides
          batch_shape = (batch_size, look_back, data_slice.shape[-1])
          batch_strides = (data_strides[0], data_strides[0], data_strides[1])
          return as_strided(data_slice, batch_shape, batch_strides, writeable=False)

          # Test
          data = np.arange(300).reshape((100, 3))
          batch = get_time_series(data, 20, 5, 4)
          print(batch)


          Output:



          [[[45 46 47]
          [48 49 50]
          [51 52 53]
          [54 55 56]
          [57 58 59]]

          [[48 49 50]
          [51 52 53]
          [54 55 56]
          [57 58 59]
          [60 61 62]]

          [[51 52 53]
          [54 55 56]
          [57 58 59]
          [60 61 62]
          [63 64 65]]

          [[54 55 56]
          [57 58 59]
          [60 61 62]
          [63 64 65]
          [66 67 68]]]





          share|improve this answer























          • Hi jdehesa, thanks for your answer. I forgot to mention that I want to select the datasets randomly and therefore no sliding window is possible. If I use your snipped and change the index calculation, it's not faster anymore.
            – Schaefma3
            Nov 12 at 12:41












          • @Schaefma3 Would it be possible to shuffle first and then take the sliding window?
            – jdehesa
            Nov 12 at 12:43










          • No, because the LSTM cell learns from the past samples (rows) and makes a prediction for the next time step. Therefore, the order of the inputs must also be in the actual time sequence.
            – Schaefma3
            Nov 12 at 12:48










          • @Schaefma3 Ah I see what you mean, you want to shuffle the examples, of course. I have added one possibility for that, check out if that could work for you (note I made a small fix in get_time_series).
            – jdehesa
            Nov 12 at 12:56












          • This works, but the runtime is only reduced by half. The problem is that indexing in a 3d numpy is extremely slow compared to a 2d array. The fact that I now have a 3d array of pointers therefore does not lead to much change (except less memory). Do you have another idea to increase performance significantly? Thanks for your efforts!
            – Schaefma3
            Nov 13 at 8:57











          Your Answer






          StackExchange.ifUsing("editor", function () {
          StackExchange.using("externalEditor", function () {
          StackExchange.using("snippets", function () {
          StackExchange.snippets.init();
          });
          });
          }, "code-snippets");

          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "1"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });














          draft saved

          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53260857%2fslow-numpy-array-indexing-for-keras-time-series-generator%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          0














          EDIT 2:



          Not sure if this is going to be any faster, but you can also just do something like this. It still relies on advanced indexing, although over contiguous data, so maybe it's a bit better?:



          import numpy as np

          def get_time_series(data, indices, look_back):
          # Make sure indices are big enough
          indices = indices[indices >= look_back]
          # Make indexing matrix
          idx = indices[:, np.newaxis] + np.arange(-look_back, 0)
          # Make batch
          return data[idx]


          You would use it for example like this:



          import numpy as np

          def get_time_series(data, indices, look_back):
          indices = indices[indices >= look_back]
          idx = indices[:, np.newaxis] + np.arange(-look_back, 0)
          return data[idx]

          def make_batches(data, look_back, batch_size):
          indices = np.random.permutation(np.arange(look_back, len(data) + 1))
          for i in range(0, len(indices), batch_size):
          yield get_time_series(data, indices[i:i + batch_size], look_back)

          data = ...
          look_back = ...
          batch_size = ...
          for batch in make_batches(data, look_back, batch_size):
          # Use batch




          EDIT:



          If you want to shuffle the examples, you could first make the sliding window for the whole dataset (which should not take any memory or time) and then take batches from a shuffled index:



          # Make sliding window with the previous function
          data_sw = get_time_series(data, 0, look_back, len(data))
          # Random index
          batch_idx = np.random.permutation(len(data_sw))
          # To get the first batch
          batch = data_sw[batch_idx[:batch_size]]




          I think this does what you want, and should be quite faster than using loops:



          import numpy as np

          def get_time_series(data, index, look_back, batch_size):
          from numpy.lib.stride_tricks import as_strided
          # Index should be at least as big as look_back to have enough elements before it
          index = max(index, look_back)
          # Batch size should not go beyond the array
          batch_size = min(batch_size, len(data) - index + 1)
          # Relevant slice for the batch
          data_slice = data[index - look_back:index + batch_size]
          # Reshape with stride tricks as a "sliding window"
          data_strides = data_slice.strides
          batch_shape = (batch_size, look_back, data_slice.shape[-1])
          batch_strides = (data_strides[0], data_strides[0], data_strides[1])
          return as_strided(data_slice, batch_shape, batch_strides, writeable=False)

          # Test
          data = np.arange(300).reshape((100, 3))
          batch = get_time_series(data, 20, 5, 4)
          print(batch)


          Output:



          [[[45 46 47]
          [48 49 50]
          [51 52 53]
          [54 55 56]
          [57 58 59]]

          [[48 49 50]
          [51 52 53]
          [54 55 56]
          [57 58 59]
          [60 61 62]]

          [[51 52 53]
          [54 55 56]
          [57 58 59]
          [60 61 62]
          [63 64 65]]

          [[54 55 56]
          [57 58 59]
          [60 61 62]
          [63 64 65]
          [66 67 68]]]





          share|improve this answer























          • Hi jdehesa, thanks for your answer. I forgot to mention that I want to select the datasets randomly and therefore no sliding window is possible. If I use your snipped and change the index calculation, it's not faster anymore.
            – Schaefma3
            Nov 12 at 12:41












          • @Schaefma3 Would it be possible to shuffle first and then take the sliding window?
            – jdehesa
            Nov 12 at 12:43










          • No, because the LSTM cell learns from the past samples (rows) and makes a prediction for the next time step. Therefore, the order of the inputs must also be in the actual time sequence.
            – Schaefma3
            Nov 12 at 12:48










          • @Schaefma3 Ah I see what you mean, you want to shuffle the examples, of course. I have added one possibility for that, check out if that could work for you (note I made a small fix in get_time_series).
            – jdehesa
            Nov 12 at 12:56












          • This works, but the runtime is only reduced by half. The problem is that indexing in a 3d numpy is extremely slow compared to a 2d array. The fact that I now have a 3d array of pointers therefore does not lead to much change (except less memory). Do you have another idea to increase performance significantly? Thanks for your efforts!
            – Schaefma3
            Nov 13 at 8:57
















          0














          EDIT 2:



          Not sure if this is going to be any faster, but you can also just do something like this. It still relies on advanced indexing, although over contiguous data, so maybe it's a bit better?:



          import numpy as np

          def get_time_series(data, indices, look_back):
          # Make sure indices are big enough
          indices = indices[indices >= look_back]
          # Make indexing matrix
          idx = indices[:, np.newaxis] + np.arange(-look_back, 0)
          # Make batch
          return data[idx]


          You would use it for example like this:



          import numpy as np

          def get_time_series(data, indices, look_back):
          indices = indices[indices >= look_back]
          idx = indices[:, np.newaxis] + np.arange(-look_back, 0)
          return data[idx]

          def make_batches(data, look_back, batch_size):
          indices = np.random.permutation(np.arange(look_back, len(data) + 1))
          for i in range(0, len(indices), batch_size):
          yield get_time_series(data, indices[i:i + batch_size], look_back)

          data = ...
          look_back = ...
          batch_size = ...
          for batch in make_batches(data, look_back, batch_size):
          # Use batch




          EDIT:



          If you want to shuffle the examples, you could first make the sliding window for the whole dataset (which should not take any memory or time) and then take batches from a shuffled index:



          # Make sliding window with the previous function
          data_sw = get_time_series(data, 0, look_back, len(data))
          # Random index
          batch_idx = np.random.permutation(len(data_sw))
          # To get the first batch
          batch = data_sw[batch_idx[:batch_size]]




          I think this does what you want, and should be quite faster than using loops:



          import numpy as np

          def get_time_series(data, index, look_back, batch_size):
          from numpy.lib.stride_tricks import as_strided
          # Index should be at least as big as look_back to have enough elements before it
          index = max(index, look_back)
          # Batch size should not go beyond the array
          batch_size = min(batch_size, len(data) - index + 1)
          # Relevant slice for the batch
          data_slice = data[index - look_back:index + batch_size]
          # Reshape with stride tricks as a "sliding window"
          data_strides = data_slice.strides
          batch_shape = (batch_size, look_back, data_slice.shape[-1])
          batch_strides = (data_strides[0], data_strides[0], data_strides[1])
          return as_strided(data_slice, batch_shape, batch_strides, writeable=False)

          # Test
          data = np.arange(300).reshape((100, 3))
          batch = get_time_series(data, 20, 5, 4)
          print(batch)


          Output:



          [[[45 46 47]
          [48 49 50]
          [51 52 53]
          [54 55 56]
          [57 58 59]]

          [[48 49 50]
          [51 52 53]
          [54 55 56]
          [57 58 59]
          [60 61 62]]

          [[51 52 53]
          [54 55 56]
          [57 58 59]
          [60 61 62]
          [63 64 65]]

          [[54 55 56]
          [57 58 59]
          [60 61 62]
          [63 64 65]
          [66 67 68]]]





          share|improve this answer























          • Hi jdehesa, thanks for your answer. I forgot to mention that I want to select the datasets randomly and therefore no sliding window is possible. If I use your snipped and change the index calculation, it's not faster anymore.
            – Schaefma3
            Nov 12 at 12:41












          • @Schaefma3 Would it be possible to shuffle first and then take the sliding window?
            – jdehesa
            Nov 12 at 12:43










          • No, because the LSTM cell learns from the past samples (rows) and makes a prediction for the next time step. Therefore, the order of the inputs must also be in the actual time sequence.
            – Schaefma3
            Nov 12 at 12:48










          • @Schaefma3 Ah I see what you mean, you want to shuffle the examples, of course. I have added one possibility for that, check out if that could work for you (note I made a small fix in get_time_series).
            – jdehesa
            Nov 12 at 12:56












          • This works, but the runtime is only reduced by half. The problem is that indexing in a 3d numpy is extremely slow compared to a 2d array. The fact that I now have a 3d array of pointers therefore does not lead to much change (except less memory). Do you have another idea to increase performance significantly? Thanks for your efforts!
            – Schaefma3
            Nov 13 at 8:57














          0












          0








          0






          EDIT 2:



          Not sure if this is going to be any faster, but you can also just do something like this. It still relies on advanced indexing, although over contiguous data, so maybe it's a bit better?:



          import numpy as np

          def get_time_series(data, indices, look_back):
          # Make sure indices are big enough
          indices = indices[indices >= look_back]
          # Make indexing matrix
          idx = indices[:, np.newaxis] + np.arange(-look_back, 0)
          # Make batch
          return data[idx]


          You would use it for example like this:



          import numpy as np

          def get_time_series(data, indices, look_back):
          indices = indices[indices >= look_back]
          idx = indices[:, np.newaxis] + np.arange(-look_back, 0)
          return data[idx]

          def make_batches(data, look_back, batch_size):
          indices = np.random.permutation(np.arange(look_back, len(data) + 1))
          for i in range(0, len(indices), batch_size):
          yield get_time_series(data, indices[i:i + batch_size], look_back)

          data = ...
          look_back = ...
          batch_size = ...
          for batch in make_batches(data, look_back, batch_size):
          # Use batch




          EDIT:



          If you want to shuffle the examples, you could first make the sliding window for the whole dataset (which should not take any memory or time) and then take batches from a shuffled index:



          # Make sliding window with the previous function
          data_sw = get_time_series(data, 0, look_back, len(data))
          # Random index
          batch_idx = np.random.permutation(len(data_sw))
          # To get the first batch
          batch = data_sw[batch_idx[:batch_size]]




          I think this does what you want, and should be quite faster than using loops:



          import numpy as np

          def get_time_series(data, index, look_back, batch_size):
          from numpy.lib.stride_tricks import as_strided
          # Index should be at least as big as look_back to have enough elements before it
          index = max(index, look_back)
          # Batch size should not go beyond the array
          batch_size = min(batch_size, len(data) - index + 1)
          # Relevant slice for the batch
          data_slice = data[index - look_back:index + batch_size]
          # Reshape with stride tricks as a "sliding window"
          data_strides = data_slice.strides
          batch_shape = (batch_size, look_back, data_slice.shape[-1])
          batch_strides = (data_strides[0], data_strides[0], data_strides[1])
          return as_strided(data_slice, batch_shape, batch_strides, writeable=False)

          # Test
          data = np.arange(300).reshape((100, 3))
          batch = get_time_series(data, 20, 5, 4)
          print(batch)


          Output:



          [[[45 46 47]
          [48 49 50]
          [51 52 53]
          [54 55 56]
          [57 58 59]]

          [[48 49 50]
          [51 52 53]
          [54 55 56]
          [57 58 59]
          [60 61 62]]

          [[51 52 53]
          [54 55 56]
          [57 58 59]
          [60 61 62]
          [63 64 65]]

          [[54 55 56]
          [57 58 59]
          [60 61 62]
          [63 64 65]
          [66 67 68]]]





          share|improve this answer














          EDIT 2:



          Not sure if this is going to be any faster, but you can also just do something like this. It still relies on advanced indexing, although over contiguous data, so maybe it's a bit better?:



          import numpy as np

          def get_time_series(data, indices, look_back):
          # Make sure indices are big enough
          indices = indices[indices >= look_back]
          # Make indexing matrix
          idx = indices[:, np.newaxis] + np.arange(-look_back, 0)
          # Make batch
          return data[idx]


          You would use it for example like this:



          import numpy as np

          def get_time_series(data, indices, look_back):
          indices = indices[indices >= look_back]
          idx = indices[:, np.newaxis] + np.arange(-look_back, 0)
          return data[idx]

          def make_batches(data, look_back, batch_size):
          indices = np.random.permutation(np.arange(look_back, len(data) + 1))
          for i in range(0, len(indices), batch_size):
          yield get_time_series(data, indices[i:i + batch_size], look_back)

          data = ...
          look_back = ...
          batch_size = ...
          for batch in make_batches(data, look_back, batch_size):
          # Use batch




          EDIT:



          If you want to shuffle the examples, you could first make the sliding window for the whole dataset (which should not take any memory or time) and then take batches from a shuffled index:



          # Make sliding window with the previous function
          data_sw = get_time_series(data, 0, look_back, len(data))
          # Random index
          batch_idx = np.random.permutation(len(data_sw))
          # To get the first batch
          batch = data_sw[batch_idx[:batch_size]]




          I think this does what you want, and should be quite faster than using loops:



          import numpy as np

          def get_time_series(data, index, look_back, batch_size):
          from numpy.lib.stride_tricks import as_strided
          # Index should be at least as big as look_back to have enough elements before it
          index = max(index, look_back)
          # Batch size should not go beyond the array
          batch_size = min(batch_size, len(data) - index + 1)
          # Relevant slice for the batch
          data_slice = data[index - look_back:index + batch_size]
          # Reshape with stride tricks as a "sliding window"
          data_strides = data_slice.strides
          batch_shape = (batch_size, look_back, data_slice.shape[-1])
          batch_strides = (data_strides[0], data_strides[0], data_strides[1])
          return as_strided(data_slice, batch_shape, batch_strides, writeable=False)

          # Test
          data = np.arange(300).reshape((100, 3))
          batch = get_time_series(data, 20, 5, 4)
          print(batch)


          Output:



          [[[45 46 47]
          [48 49 50]
          [51 52 53]
          [54 55 56]
          [57 58 59]]

          [[48 49 50]
          [51 52 53]
          [54 55 56]
          [57 58 59]
          [60 61 62]]

          [[51 52 53]
          [54 55 56]
          [57 58 59]
          [60 61 62]
          [63 64 65]]

          [[54 55 56]
          [57 58 59]
          [60 61 62]
          [63 64 65]
          [66 67 68]]]






          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited Nov 13 at 10:49

























          answered Nov 12 at 12:03









          jdehesa

          22.2k43150




          22.2k43150












          • Hi jdehesa, thanks for your answer. I forgot to mention that I want to select the datasets randomly and therefore no sliding window is possible. If I use your snipped and change the index calculation, it's not faster anymore.
            – Schaefma3
            Nov 12 at 12:41












          • @Schaefma3 Would it be possible to shuffle first and then take the sliding window?
            – jdehesa
            Nov 12 at 12:43










          • No, because the LSTM cell learns from the past samples (rows) and makes a prediction for the next time step. Therefore, the order of the inputs must also be in the actual time sequence.
            – Schaefma3
            Nov 12 at 12:48










          • @Schaefma3 Ah I see what you mean, you want to shuffle the examples, of course. I have added one possibility for that, check out if that could work for you (note I made a small fix in get_time_series).
            – jdehesa
            Nov 12 at 12:56












          • This works, but the runtime is only reduced by half. The problem is that indexing in a 3d numpy is extremely slow compared to a 2d array. The fact that I now have a 3d array of pointers therefore does not lead to much change (except less memory). Do you have another idea to increase performance significantly? Thanks for your efforts!
            – Schaefma3
            Nov 13 at 8:57


















          • Hi jdehesa, thanks for your answer. I forgot to mention that I want to select the datasets randomly and therefore no sliding window is possible. If I use your snipped and change the index calculation, it's not faster anymore.
            – Schaefma3
            Nov 12 at 12:41












          • @Schaefma3 Would it be possible to shuffle first and then take the sliding window?
            – jdehesa
            Nov 12 at 12:43










          • No, because the LSTM cell learns from the past samples (rows) and makes a prediction for the next time step. Therefore, the order of the inputs must also be in the actual time sequence.
            – Schaefma3
            Nov 12 at 12:48










          • @Schaefma3 Ah I see what you mean, you want to shuffle the examples, of course. I have added one possibility for that, check out if that could work for you (note I made a small fix in get_time_series).
            – jdehesa
            Nov 12 at 12:56












          • This works, but the runtime is only reduced by half. The problem is that indexing in a 3d numpy is extremely slow compared to a 2d array. The fact that I now have a 3d array of pointers therefore does not lead to much change (except less memory). Do you have another idea to increase performance significantly? Thanks for your efforts!
            – Schaefma3
            Nov 13 at 8:57
















          Hi jdehesa, thanks for your answer. I forgot to mention that I want to select the datasets randomly and therefore no sliding window is possible. If I use your snipped and change the index calculation, it's not faster anymore.
          – Schaefma3
          Nov 12 at 12:41






          Hi jdehesa, thanks for your answer. I forgot to mention that I want to select the datasets randomly and therefore no sliding window is possible. If I use your snipped and change the index calculation, it's not faster anymore.
          – Schaefma3
          Nov 12 at 12:41














          @Schaefma3 Would it be possible to shuffle first and then take the sliding window?
          – jdehesa
          Nov 12 at 12:43




          @Schaefma3 Would it be possible to shuffle first and then take the sliding window?
          – jdehesa
          Nov 12 at 12:43












          No, because the LSTM cell learns from the past samples (rows) and makes a prediction for the next time step. Therefore, the order of the inputs must also be in the actual time sequence.
          – Schaefma3
          Nov 12 at 12:48




          No, because the LSTM cell learns from the past samples (rows) and makes a prediction for the next time step. Therefore, the order of the inputs must also be in the actual time sequence.
          – Schaefma3
          Nov 12 at 12:48












          @Schaefma3 Ah I see what you mean, you want to shuffle the examples, of course. I have added one possibility for that, check out if that could work for you (note I made a small fix in get_time_series).
          – jdehesa
          Nov 12 at 12:56






          @Schaefma3 Ah I see what you mean, you want to shuffle the examples, of course. I have added one possibility for that, check out if that could work for you (note I made a small fix in get_time_series).
          – jdehesa
          Nov 12 at 12:56














          This works, but the runtime is only reduced by half. The problem is that indexing in a 3d numpy is extremely slow compared to a 2d array. The fact that I now have a 3d array of pointers therefore does not lead to much change (except less memory). Do you have another idea to increase performance significantly? Thanks for your efforts!
          – Schaefma3
          Nov 13 at 8:57




          This works, but the runtime is only reduced by half. The problem is that indexing in a 3d numpy is extremely slow compared to a 2d array. The fact that I now have a 3d array of pointers therefore does not lead to much change (except less memory). Do you have another idea to increase performance significantly? Thanks for your efforts!
          – Schaefma3
          Nov 13 at 8:57


















          draft saved

          draft discarded




















































          Thanks for contributing an answer to Stack Overflow!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          To learn more, see our tips on writing great answers.





          Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


          Please pay close attention to the following guidance:


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53260857%2fslow-numpy-array-indexing-for-keras-time-series-generator%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          The Sandy Post

          Danny Elfman

          Pages that link to "Head v. Amoskeag Manufacturing Co."