Slow numpy array indexing for keras time series generator
I use the keras time series generator for training a neural network with LSTM cells, which unfortunately proved to be a bottleneck in training.
Below is a simplified example to run, which shows the high runtime of the batch generator. It is important to note that the rows from the dataset are chosen randomly and thus a sliding window is not possible. During the training the CPUs are running continuously at about 80%, whereas the GPU is running at a single-digit percentage rate.
def get_time_series(data, index, look_back, batch_size):
samples1 = np.empty((batch_size, look_back, np.size(data, axis=1)))
rows = np.random.randint(look_back, np.size(data, axis=1), size=batch_size)
for j, row in enumerate(rows):
indices = range(rows[j] - look_back, rows[j], 1)
samples1[j] = data[indices]
return samples1
data = np.random.rand(100000, 20)
start = time.time()
batch = get_time_series(data, index=50, look_back=1000, batch_size=2**12)
print("Batch generator needs", time.time()-start, "seconds")
Result:
Batch generator needs 0.6224319934844971 seconds
I already tried to build the 3-d array first, so I only have to index the array-rows in the *get_time_series-*Function. This was about 60 times faster during the training, but leads to an "out of memory error" with large datasets.
Does anyone have ideas on how to improve the performance of this bottleneck? Work with pointer, faster indexing methods, ...
Thanks,
Max
performance numpy tensorflow keras generator
add a comment |
I use the keras time series generator for training a neural network with LSTM cells, which unfortunately proved to be a bottleneck in training.
Below is a simplified example to run, which shows the high runtime of the batch generator. It is important to note that the rows from the dataset are chosen randomly and thus a sliding window is not possible. During the training the CPUs are running continuously at about 80%, whereas the GPU is running at a single-digit percentage rate.
def get_time_series(data, index, look_back, batch_size):
samples1 = np.empty((batch_size, look_back, np.size(data, axis=1)))
rows = np.random.randint(look_back, np.size(data, axis=1), size=batch_size)
for j, row in enumerate(rows):
indices = range(rows[j] - look_back, rows[j], 1)
samples1[j] = data[indices]
return samples1
data = np.random.rand(100000, 20)
start = time.time()
batch = get_time_series(data, index=50, look_back=1000, batch_size=2**12)
print("Batch generator needs", time.time()-start, "seconds")
Result:
Batch generator needs 0.6224319934844971 seconds
I already tried to build the 3-d array first, so I only have to index the array-rows in the *get_time_series-*Function. This was about 60 times faster during the training, but leads to an "out of memory error" with large datasets.
Does anyone have ideas on how to improve the performance of this bottleneck? Work with pointer, faster indexing methods, ...
Thanks,
Max
performance numpy tensorflow keras generator
add a comment |
I use the keras time series generator for training a neural network with LSTM cells, which unfortunately proved to be a bottleneck in training.
Below is a simplified example to run, which shows the high runtime of the batch generator. It is important to note that the rows from the dataset are chosen randomly and thus a sliding window is not possible. During the training the CPUs are running continuously at about 80%, whereas the GPU is running at a single-digit percentage rate.
def get_time_series(data, index, look_back, batch_size):
samples1 = np.empty((batch_size, look_back, np.size(data, axis=1)))
rows = np.random.randint(look_back, np.size(data, axis=1), size=batch_size)
for j, row in enumerate(rows):
indices = range(rows[j] - look_back, rows[j], 1)
samples1[j] = data[indices]
return samples1
data = np.random.rand(100000, 20)
start = time.time()
batch = get_time_series(data, index=50, look_back=1000, batch_size=2**12)
print("Batch generator needs", time.time()-start, "seconds")
Result:
Batch generator needs 0.6224319934844971 seconds
I already tried to build the 3-d array first, so I only have to index the array-rows in the *get_time_series-*Function. This was about 60 times faster during the training, but leads to an "out of memory error" with large datasets.
Does anyone have ideas on how to improve the performance of this bottleneck? Work with pointer, faster indexing methods, ...
Thanks,
Max
performance numpy tensorflow keras generator
I use the keras time series generator for training a neural network with LSTM cells, which unfortunately proved to be a bottleneck in training.
Below is a simplified example to run, which shows the high runtime of the batch generator. It is important to note that the rows from the dataset are chosen randomly and thus a sliding window is not possible. During the training the CPUs are running continuously at about 80%, whereas the GPU is running at a single-digit percentage rate.
def get_time_series(data, index, look_back, batch_size):
samples1 = np.empty((batch_size, look_back, np.size(data, axis=1)))
rows = np.random.randint(look_back, np.size(data, axis=1), size=batch_size)
for j, row in enumerate(rows):
indices = range(rows[j] - look_back, rows[j], 1)
samples1[j] = data[indices]
return samples1
data = np.random.rand(100000, 20)
start = time.time()
batch = get_time_series(data, index=50, look_back=1000, batch_size=2**12)
print("Batch generator needs", time.time()-start, "seconds")
Result:
Batch generator needs 0.6224319934844971 seconds
I already tried to build the 3-d array first, so I only have to index the array-rows in the *get_time_series-*Function. This was about 60 times faster during the training, but leads to an "out of memory error" with large datasets.
Does anyone have ideas on how to improve the performance of this bottleneck? Work with pointer, faster indexing methods, ...
Thanks,
Max
performance numpy tensorflow keras generator
performance numpy tensorflow keras generator
edited Nov 12 at 12:33
asked Nov 12 at 11:07
Schaefma3
112
112
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
EDIT 2:
Not sure if this is going to be any faster, but you can also just do something like this. It still relies on advanced indexing, although over contiguous data, so maybe it's a bit better?:
import numpy as np
def get_time_series(data, indices, look_back):
# Make sure indices are big enough
indices = indices[indices >= look_back]
# Make indexing matrix
idx = indices[:, np.newaxis] + np.arange(-look_back, 0)
# Make batch
return data[idx]
You would use it for example like this:
import numpy as np
def get_time_series(data, indices, look_back):
indices = indices[indices >= look_back]
idx = indices[:, np.newaxis] + np.arange(-look_back, 0)
return data[idx]
def make_batches(data, look_back, batch_size):
indices = np.random.permutation(np.arange(look_back, len(data) + 1))
for i in range(0, len(indices), batch_size):
yield get_time_series(data, indices[i:i + batch_size], look_back)
data = ...
look_back = ...
batch_size = ...
for batch in make_batches(data, look_back, batch_size):
# Use batch
EDIT:
If you want to shuffle the examples, you could first make the sliding window for the whole dataset (which should not take any memory or time) and then take batches from a shuffled index:
# Make sliding window with the previous function
data_sw = get_time_series(data, 0, look_back, len(data))
# Random index
batch_idx = np.random.permutation(len(data_sw))
# To get the first batch
batch = data_sw[batch_idx[:batch_size]]
I think this does what you want, and should be quite faster than using loops:
import numpy as np
def get_time_series(data, index, look_back, batch_size):
from numpy.lib.stride_tricks import as_strided
# Index should be at least as big as look_back to have enough elements before it
index = max(index, look_back)
# Batch size should not go beyond the array
batch_size = min(batch_size, len(data) - index + 1)
# Relevant slice for the batch
data_slice = data[index - look_back:index + batch_size]
# Reshape with stride tricks as a "sliding window"
data_strides = data_slice.strides
batch_shape = (batch_size, look_back, data_slice.shape[-1])
batch_strides = (data_strides[0], data_strides[0], data_strides[1])
return as_strided(data_slice, batch_shape, batch_strides, writeable=False)
# Test
data = np.arange(300).reshape((100, 3))
batch = get_time_series(data, 20, 5, 4)
print(batch)
Output:
[[[45 46 47]
[48 49 50]
[51 52 53]
[54 55 56]
[57 58 59]]
[[48 49 50]
[51 52 53]
[54 55 56]
[57 58 59]
[60 61 62]]
[[51 52 53]
[54 55 56]
[57 58 59]
[60 61 62]
[63 64 65]]
[[54 55 56]
[57 58 59]
[60 61 62]
[63 64 65]
[66 67 68]]]
Hi jdehesa, thanks for your answer. I forgot to mention that I want to select the datasets randomly and therefore no sliding window is possible. If I use your snipped and change the index calculation, it's not faster anymore.
– Schaefma3
Nov 12 at 12:41
@Schaefma3 Would it be possible to shuffle first and then take the sliding window?
– jdehesa
Nov 12 at 12:43
No, because the LSTM cell learns from the past samples (rows) and makes a prediction for the next time step. Therefore, the order of the inputs must also be in the actual time sequence.
– Schaefma3
Nov 12 at 12:48
@Schaefma3 Ah I see what you mean, you want to shuffle the examples, of course. I have added one possibility for that, check out if that could work for you (note I made a small fix inget_time_series).
– jdehesa
Nov 12 at 12:56
This works, but the runtime is only reduced by half. The problem is that indexing in a 3d numpy is extremely slow compared to a 2d array. The fact that I now have a 3d array of pointers therefore does not lead to much change (except less memory). Do you have another idea to increase performance significantly? Thanks for your efforts!
– Schaefma3
Nov 13 at 8:57
|
show 1 more comment
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53260857%2fslow-numpy-array-indexing-for-keras-time-series-generator%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
EDIT 2:
Not sure if this is going to be any faster, but you can also just do something like this. It still relies on advanced indexing, although over contiguous data, so maybe it's a bit better?:
import numpy as np
def get_time_series(data, indices, look_back):
# Make sure indices are big enough
indices = indices[indices >= look_back]
# Make indexing matrix
idx = indices[:, np.newaxis] + np.arange(-look_back, 0)
# Make batch
return data[idx]
You would use it for example like this:
import numpy as np
def get_time_series(data, indices, look_back):
indices = indices[indices >= look_back]
idx = indices[:, np.newaxis] + np.arange(-look_back, 0)
return data[idx]
def make_batches(data, look_back, batch_size):
indices = np.random.permutation(np.arange(look_back, len(data) + 1))
for i in range(0, len(indices), batch_size):
yield get_time_series(data, indices[i:i + batch_size], look_back)
data = ...
look_back = ...
batch_size = ...
for batch in make_batches(data, look_back, batch_size):
# Use batch
EDIT:
If you want to shuffle the examples, you could first make the sliding window for the whole dataset (which should not take any memory or time) and then take batches from a shuffled index:
# Make sliding window with the previous function
data_sw = get_time_series(data, 0, look_back, len(data))
# Random index
batch_idx = np.random.permutation(len(data_sw))
# To get the first batch
batch = data_sw[batch_idx[:batch_size]]
I think this does what you want, and should be quite faster than using loops:
import numpy as np
def get_time_series(data, index, look_back, batch_size):
from numpy.lib.stride_tricks import as_strided
# Index should be at least as big as look_back to have enough elements before it
index = max(index, look_back)
# Batch size should not go beyond the array
batch_size = min(batch_size, len(data) - index + 1)
# Relevant slice for the batch
data_slice = data[index - look_back:index + batch_size]
# Reshape with stride tricks as a "sliding window"
data_strides = data_slice.strides
batch_shape = (batch_size, look_back, data_slice.shape[-1])
batch_strides = (data_strides[0], data_strides[0], data_strides[1])
return as_strided(data_slice, batch_shape, batch_strides, writeable=False)
# Test
data = np.arange(300).reshape((100, 3))
batch = get_time_series(data, 20, 5, 4)
print(batch)
Output:
[[[45 46 47]
[48 49 50]
[51 52 53]
[54 55 56]
[57 58 59]]
[[48 49 50]
[51 52 53]
[54 55 56]
[57 58 59]
[60 61 62]]
[[51 52 53]
[54 55 56]
[57 58 59]
[60 61 62]
[63 64 65]]
[[54 55 56]
[57 58 59]
[60 61 62]
[63 64 65]
[66 67 68]]]
Hi jdehesa, thanks for your answer. I forgot to mention that I want to select the datasets randomly and therefore no sliding window is possible. If I use your snipped and change the index calculation, it's not faster anymore.
– Schaefma3
Nov 12 at 12:41
@Schaefma3 Would it be possible to shuffle first and then take the sliding window?
– jdehesa
Nov 12 at 12:43
No, because the LSTM cell learns from the past samples (rows) and makes a prediction for the next time step. Therefore, the order of the inputs must also be in the actual time sequence.
– Schaefma3
Nov 12 at 12:48
@Schaefma3 Ah I see what you mean, you want to shuffle the examples, of course. I have added one possibility for that, check out if that could work for you (note I made a small fix inget_time_series).
– jdehesa
Nov 12 at 12:56
This works, but the runtime is only reduced by half. The problem is that indexing in a 3d numpy is extremely slow compared to a 2d array. The fact that I now have a 3d array of pointers therefore does not lead to much change (except less memory). Do you have another idea to increase performance significantly? Thanks for your efforts!
– Schaefma3
Nov 13 at 8:57
|
show 1 more comment
EDIT 2:
Not sure if this is going to be any faster, but you can also just do something like this. It still relies on advanced indexing, although over contiguous data, so maybe it's a bit better?:
import numpy as np
def get_time_series(data, indices, look_back):
# Make sure indices are big enough
indices = indices[indices >= look_back]
# Make indexing matrix
idx = indices[:, np.newaxis] + np.arange(-look_back, 0)
# Make batch
return data[idx]
You would use it for example like this:
import numpy as np
def get_time_series(data, indices, look_back):
indices = indices[indices >= look_back]
idx = indices[:, np.newaxis] + np.arange(-look_back, 0)
return data[idx]
def make_batches(data, look_back, batch_size):
indices = np.random.permutation(np.arange(look_back, len(data) + 1))
for i in range(0, len(indices), batch_size):
yield get_time_series(data, indices[i:i + batch_size], look_back)
data = ...
look_back = ...
batch_size = ...
for batch in make_batches(data, look_back, batch_size):
# Use batch
EDIT:
If you want to shuffle the examples, you could first make the sliding window for the whole dataset (which should not take any memory or time) and then take batches from a shuffled index:
# Make sliding window with the previous function
data_sw = get_time_series(data, 0, look_back, len(data))
# Random index
batch_idx = np.random.permutation(len(data_sw))
# To get the first batch
batch = data_sw[batch_idx[:batch_size]]
I think this does what you want, and should be quite faster than using loops:
import numpy as np
def get_time_series(data, index, look_back, batch_size):
from numpy.lib.stride_tricks import as_strided
# Index should be at least as big as look_back to have enough elements before it
index = max(index, look_back)
# Batch size should not go beyond the array
batch_size = min(batch_size, len(data) - index + 1)
# Relevant slice for the batch
data_slice = data[index - look_back:index + batch_size]
# Reshape with stride tricks as a "sliding window"
data_strides = data_slice.strides
batch_shape = (batch_size, look_back, data_slice.shape[-1])
batch_strides = (data_strides[0], data_strides[0], data_strides[1])
return as_strided(data_slice, batch_shape, batch_strides, writeable=False)
# Test
data = np.arange(300).reshape((100, 3))
batch = get_time_series(data, 20, 5, 4)
print(batch)
Output:
[[[45 46 47]
[48 49 50]
[51 52 53]
[54 55 56]
[57 58 59]]
[[48 49 50]
[51 52 53]
[54 55 56]
[57 58 59]
[60 61 62]]
[[51 52 53]
[54 55 56]
[57 58 59]
[60 61 62]
[63 64 65]]
[[54 55 56]
[57 58 59]
[60 61 62]
[63 64 65]
[66 67 68]]]
Hi jdehesa, thanks for your answer. I forgot to mention that I want to select the datasets randomly and therefore no sliding window is possible. If I use your snipped and change the index calculation, it's not faster anymore.
– Schaefma3
Nov 12 at 12:41
@Schaefma3 Would it be possible to shuffle first and then take the sliding window?
– jdehesa
Nov 12 at 12:43
No, because the LSTM cell learns from the past samples (rows) and makes a prediction for the next time step. Therefore, the order of the inputs must also be in the actual time sequence.
– Schaefma3
Nov 12 at 12:48
@Schaefma3 Ah I see what you mean, you want to shuffle the examples, of course. I have added one possibility for that, check out if that could work for you (note I made a small fix inget_time_series).
– jdehesa
Nov 12 at 12:56
This works, but the runtime is only reduced by half. The problem is that indexing in a 3d numpy is extremely slow compared to a 2d array. The fact that I now have a 3d array of pointers therefore does not lead to much change (except less memory). Do you have another idea to increase performance significantly? Thanks for your efforts!
– Schaefma3
Nov 13 at 8:57
|
show 1 more comment
EDIT 2:
Not sure if this is going to be any faster, but you can also just do something like this. It still relies on advanced indexing, although over contiguous data, so maybe it's a bit better?:
import numpy as np
def get_time_series(data, indices, look_back):
# Make sure indices are big enough
indices = indices[indices >= look_back]
# Make indexing matrix
idx = indices[:, np.newaxis] + np.arange(-look_back, 0)
# Make batch
return data[idx]
You would use it for example like this:
import numpy as np
def get_time_series(data, indices, look_back):
indices = indices[indices >= look_back]
idx = indices[:, np.newaxis] + np.arange(-look_back, 0)
return data[idx]
def make_batches(data, look_back, batch_size):
indices = np.random.permutation(np.arange(look_back, len(data) + 1))
for i in range(0, len(indices), batch_size):
yield get_time_series(data, indices[i:i + batch_size], look_back)
data = ...
look_back = ...
batch_size = ...
for batch in make_batches(data, look_back, batch_size):
# Use batch
EDIT:
If you want to shuffle the examples, you could first make the sliding window for the whole dataset (which should not take any memory or time) and then take batches from a shuffled index:
# Make sliding window with the previous function
data_sw = get_time_series(data, 0, look_back, len(data))
# Random index
batch_idx = np.random.permutation(len(data_sw))
# To get the first batch
batch = data_sw[batch_idx[:batch_size]]
I think this does what you want, and should be quite faster than using loops:
import numpy as np
def get_time_series(data, index, look_back, batch_size):
from numpy.lib.stride_tricks import as_strided
# Index should be at least as big as look_back to have enough elements before it
index = max(index, look_back)
# Batch size should not go beyond the array
batch_size = min(batch_size, len(data) - index + 1)
# Relevant slice for the batch
data_slice = data[index - look_back:index + batch_size]
# Reshape with stride tricks as a "sliding window"
data_strides = data_slice.strides
batch_shape = (batch_size, look_back, data_slice.shape[-1])
batch_strides = (data_strides[0], data_strides[0], data_strides[1])
return as_strided(data_slice, batch_shape, batch_strides, writeable=False)
# Test
data = np.arange(300).reshape((100, 3))
batch = get_time_series(data, 20, 5, 4)
print(batch)
Output:
[[[45 46 47]
[48 49 50]
[51 52 53]
[54 55 56]
[57 58 59]]
[[48 49 50]
[51 52 53]
[54 55 56]
[57 58 59]
[60 61 62]]
[[51 52 53]
[54 55 56]
[57 58 59]
[60 61 62]
[63 64 65]]
[[54 55 56]
[57 58 59]
[60 61 62]
[63 64 65]
[66 67 68]]]
EDIT 2:
Not sure if this is going to be any faster, but you can also just do something like this. It still relies on advanced indexing, although over contiguous data, so maybe it's a bit better?:
import numpy as np
def get_time_series(data, indices, look_back):
# Make sure indices are big enough
indices = indices[indices >= look_back]
# Make indexing matrix
idx = indices[:, np.newaxis] + np.arange(-look_back, 0)
# Make batch
return data[idx]
You would use it for example like this:
import numpy as np
def get_time_series(data, indices, look_back):
indices = indices[indices >= look_back]
idx = indices[:, np.newaxis] + np.arange(-look_back, 0)
return data[idx]
def make_batches(data, look_back, batch_size):
indices = np.random.permutation(np.arange(look_back, len(data) + 1))
for i in range(0, len(indices), batch_size):
yield get_time_series(data, indices[i:i + batch_size], look_back)
data = ...
look_back = ...
batch_size = ...
for batch in make_batches(data, look_back, batch_size):
# Use batch
EDIT:
If you want to shuffle the examples, you could first make the sliding window for the whole dataset (which should not take any memory or time) and then take batches from a shuffled index:
# Make sliding window with the previous function
data_sw = get_time_series(data, 0, look_back, len(data))
# Random index
batch_idx = np.random.permutation(len(data_sw))
# To get the first batch
batch = data_sw[batch_idx[:batch_size]]
I think this does what you want, and should be quite faster than using loops:
import numpy as np
def get_time_series(data, index, look_back, batch_size):
from numpy.lib.stride_tricks import as_strided
# Index should be at least as big as look_back to have enough elements before it
index = max(index, look_back)
# Batch size should not go beyond the array
batch_size = min(batch_size, len(data) - index + 1)
# Relevant slice for the batch
data_slice = data[index - look_back:index + batch_size]
# Reshape with stride tricks as a "sliding window"
data_strides = data_slice.strides
batch_shape = (batch_size, look_back, data_slice.shape[-1])
batch_strides = (data_strides[0], data_strides[0], data_strides[1])
return as_strided(data_slice, batch_shape, batch_strides, writeable=False)
# Test
data = np.arange(300).reshape((100, 3))
batch = get_time_series(data, 20, 5, 4)
print(batch)
Output:
[[[45 46 47]
[48 49 50]
[51 52 53]
[54 55 56]
[57 58 59]]
[[48 49 50]
[51 52 53]
[54 55 56]
[57 58 59]
[60 61 62]]
[[51 52 53]
[54 55 56]
[57 58 59]
[60 61 62]
[63 64 65]]
[[54 55 56]
[57 58 59]
[60 61 62]
[63 64 65]
[66 67 68]]]
edited Nov 13 at 10:49
answered Nov 12 at 12:03
jdehesa
22.2k43150
22.2k43150
Hi jdehesa, thanks for your answer. I forgot to mention that I want to select the datasets randomly and therefore no sliding window is possible. If I use your snipped and change the index calculation, it's not faster anymore.
– Schaefma3
Nov 12 at 12:41
@Schaefma3 Would it be possible to shuffle first and then take the sliding window?
– jdehesa
Nov 12 at 12:43
No, because the LSTM cell learns from the past samples (rows) and makes a prediction for the next time step. Therefore, the order of the inputs must also be in the actual time sequence.
– Schaefma3
Nov 12 at 12:48
@Schaefma3 Ah I see what you mean, you want to shuffle the examples, of course. I have added one possibility for that, check out if that could work for you (note I made a small fix inget_time_series).
– jdehesa
Nov 12 at 12:56
This works, but the runtime is only reduced by half. The problem is that indexing in a 3d numpy is extremely slow compared to a 2d array. The fact that I now have a 3d array of pointers therefore does not lead to much change (except less memory). Do you have another idea to increase performance significantly? Thanks for your efforts!
– Schaefma3
Nov 13 at 8:57
|
show 1 more comment
Hi jdehesa, thanks for your answer. I forgot to mention that I want to select the datasets randomly and therefore no sliding window is possible. If I use your snipped and change the index calculation, it's not faster anymore.
– Schaefma3
Nov 12 at 12:41
@Schaefma3 Would it be possible to shuffle first and then take the sliding window?
– jdehesa
Nov 12 at 12:43
No, because the LSTM cell learns from the past samples (rows) and makes a prediction for the next time step. Therefore, the order of the inputs must also be in the actual time sequence.
– Schaefma3
Nov 12 at 12:48
@Schaefma3 Ah I see what you mean, you want to shuffle the examples, of course. I have added one possibility for that, check out if that could work for you (note I made a small fix inget_time_series).
– jdehesa
Nov 12 at 12:56
This works, but the runtime is only reduced by half. The problem is that indexing in a 3d numpy is extremely slow compared to a 2d array. The fact that I now have a 3d array of pointers therefore does not lead to much change (except less memory). Do you have another idea to increase performance significantly? Thanks for your efforts!
– Schaefma3
Nov 13 at 8:57
Hi jdehesa, thanks for your answer. I forgot to mention that I want to select the datasets randomly and therefore no sliding window is possible. If I use your snipped and change the index calculation, it's not faster anymore.
– Schaefma3
Nov 12 at 12:41
Hi jdehesa, thanks for your answer. I forgot to mention that I want to select the datasets randomly and therefore no sliding window is possible. If I use your snipped and change the index calculation, it's not faster anymore.
– Schaefma3
Nov 12 at 12:41
@Schaefma3 Would it be possible to shuffle first and then take the sliding window?
– jdehesa
Nov 12 at 12:43
@Schaefma3 Would it be possible to shuffle first and then take the sliding window?
– jdehesa
Nov 12 at 12:43
No, because the LSTM cell learns from the past samples (rows) and makes a prediction for the next time step. Therefore, the order of the inputs must also be in the actual time sequence.
– Schaefma3
Nov 12 at 12:48
No, because the LSTM cell learns from the past samples (rows) and makes a prediction for the next time step. Therefore, the order of the inputs must also be in the actual time sequence.
– Schaefma3
Nov 12 at 12:48
@Schaefma3 Ah I see what you mean, you want to shuffle the examples, of course. I have added one possibility for that, check out if that could work for you (note I made a small fix in
get_time_series).– jdehesa
Nov 12 at 12:56
@Schaefma3 Ah I see what you mean, you want to shuffle the examples, of course. I have added one possibility for that, check out if that could work for you (note I made a small fix in
get_time_series).– jdehesa
Nov 12 at 12:56
This works, but the runtime is only reduced by half. The problem is that indexing in a 3d numpy is extremely slow compared to a 2d array. The fact that I now have a 3d array of pointers therefore does not lead to much change (except less memory). Do you have another idea to increase performance significantly? Thanks for your efforts!
– Schaefma3
Nov 13 at 8:57
This works, but the runtime is only reduced by half. The problem is that indexing in a 3d numpy is extremely slow compared to a 2d array. The fact that I now have a 3d array of pointers therefore does not lead to much change (except less memory). Do you have another idea to increase performance significantly? Thanks for your efforts!
– Schaefma3
Nov 13 at 8:57
|
show 1 more comment
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53260857%2fslow-numpy-array-indexing-for-keras-time-series-generator%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown