Slow numpy array indexing for keras time series generator

I use the keras time series generator for training a neural network with LSTM cells, which unfortunately proved to be a bottleneck in training.

Below is a simplified example to run, which shows the high runtime of the batch generator. It is important to note that the rows from the dataset are chosen randomly and thus a sliding window is not possible. During the training the CPUs are running continuously at about 80%, whereas the GPU is running at a single-digit percentage rate.

def get_time_series(data, index, look_back, batch_size):

    samples1 = np.empty((batch_size, look_back, np.size(data, axis=1)))

    rows = np.random.randint(look_back, np.size(data, axis=1), size=batch_size)

    for j, row in enumerate(rows):

        indices = range(rows[j] - look_back, rows[j], 1)

        samples1[j] = data[indices]

    return samples1





data = np.random.rand(100000, 20)

start = time.time()

batch = get_time_series(data, index=50, look_back=1000, batch_size=2**12)

print("Batch generator needs",  time.time()-start,  "seconds")

Result:

Batch generator needs 0.6224319934844971 seconds

I already tried to build the 3-d array first, so I only have to index the array-rows in the *get_time_series-*Function. This was about 60 times faster during the training, but leads to an "out of memory error" with large datasets.

Does anyone have ideas on how to improve the performance of this bottleneck? Work with pointer, faster indexing methods, ...

Thanks,
Max

edited Nov 12 at 12:33

asked Nov 12 at 11:07

Schaefma3

112

add a comment |

I use the keras time series generator for training a neural network with LSTM cells, which unfortunately proved to be a bottleneck in training.

def get_time_series(data, index, look_back, batch_size):

    samples1 = np.empty((batch_size, look_back, np.size(data, axis=1)))

    rows = np.random.randint(look_back, np.size(data, axis=1), size=batch_size)

    for j, row in enumerate(rows):

        indices = range(rows[j] - look_back, rows[j], 1)

        samples1[j] = data[indices]

    return samples1





data = np.random.rand(100000, 20)

start = time.time()

batch = get_time_series(data, index=50, look_back=1000, batch_size=2**12)

print("Batch generator needs",  time.time()-start,  "seconds")

Result:

Batch generator needs 0.6224319934844971 seconds

Does anyone have ideas on how to improve the performance of this bottleneck? Work with pointer, faster indexing methods, ...

Thanks,
Max

edited Nov 12 at 12:33

asked Nov 12 at 11:07

Schaefma3

112

add a comment |

I use the keras time series generator for training a neural network with LSTM cells, which unfortunately proved to be a bottleneck in training.

def get_time_series(data, index, look_back, batch_size):

    samples1 = np.empty((batch_size, look_back, np.size(data, axis=1)))

    rows = np.random.randint(look_back, np.size(data, axis=1), size=batch_size)

    for j, row in enumerate(rows):

        indices = range(rows[j] - look_back, rows[j], 1)

        samples1[j] = data[indices]

    return samples1





data = np.random.rand(100000, 20)

start = time.time()

batch = get_time_series(data, index=50, look_back=1000, batch_size=2**12)

print("Batch generator needs",  time.time()-start,  "seconds")

Result:

Batch generator needs 0.6224319934844971 seconds

Does anyone have ideas on how to improve the performance of this bottleneck? Work with pointer, faster indexing methods, ...

Thanks,
Max

edited Nov 12 at 12:33

asked Nov 12 at 11:07

Schaefma3

112

I use the keras time series generator for training a neural network with LSTM cells, which unfortunately proved to be a bottleneck in training.

def get_time_series(data, index, look_back, batch_size):

    samples1 = np.empty((batch_size, look_back, np.size(data, axis=1)))

    rows = np.random.randint(look_back, np.size(data, axis=1), size=batch_size)

    for j, row in enumerate(rows):

        indices = range(rows[j] - look_back, rows[j], 1)

        samples1[j] = data[indices]

    return samples1





data = np.random.rand(100000, 20)

start = time.time()

batch = get_time_series(data, index=50, look_back=1000, batch_size=2**12)

print("Batch generator needs",  time.time()-start,  "seconds")

Result:

Batch generator needs 0.6224319934844971 seconds

Does anyone have ideas on how to improve the performance of this bottleneck? Work with pointer, faster indexing methods, ...

Thanks,
Max

performance numpy tensorflow keras generator

edited Nov 12 at 12:33

asked Nov 12 at 11:07

Schaefma3

112

edited Nov 12 at 12:33

asked Nov 12 at 11:07

Schaefma3

112

edited Nov 12 at 12:33

asked Nov 12 at 11:07

Schaefma3

112

asked Nov 12 at 11:07

Schaefma3

112

asked Nov 12 at 11:07

Schaefma3

112

add a comment |

1 Answer
1

active

oldest

votes

EDIT 2:

Not sure if this is going to be any faster, but you can also just do something like this. It still relies on advanced indexing, although over contiguous data, so maybe it's a bit better?:

import numpy as np



def get_time_series(data, indices, look_back):

    # Make sure indices are big enough

    indices = indices[indices >= look_back]

    # Make indexing matrix

    idx = indices[:, np.newaxis] + np.arange(-look_back, 0)

    # Make batch

    return data[idx]

You would use it for example like this:

import numpy as np



def get_time_series(data, indices, look_back):

    indices = indices[indices >= look_back]

    idx = indices[:, np.newaxis] + np.arange(-look_back, 0)

    return data[idx]



def make_batches(data, look_back, batch_size):

    indices = np.random.permutation(np.arange(look_back, len(data) + 1))

    for i in range(0, len(indices), batch_size):

        yield get_time_series(data, indices[i:i + batch_size], look_back)



data = ...

look_back = ...

batch_size = ...

for batch in make_batches(data, look_back, batch_size):

    # Use batch

EDIT:

If you want to shuffle the examples, you could first make the sliding window for the whole dataset (which should not take any memory or time) and then take batches from a shuffled index:

# Make sliding window with the previous function

data_sw = get_time_series(data, 0, look_back, len(data))

# Random index

batch_idx = np.random.permutation(len(data_sw))

# To get the first batch

batch = data_sw[batch_idx[:batch_size]]

I think this does what you want, and should be quite faster than using loops:

import numpy as np



def get_time_series(data, index, look_back, batch_size):

    from numpy.lib.stride_tricks import as_strided

    # Index should be at least as big as look_back to have enough elements before it

    index = max(index, look_back)

    # Batch size should not go beyond the array

    batch_size = min(batch_size, len(data) - index + 1)

    # Relevant slice for the batch

    data_slice = data[index - look_back:index + batch_size]

    # Reshape with stride tricks as a "sliding window"

    data_strides = data_slice.strides

    batch_shape = (batch_size, look_back, data_slice.shape[-1])

    batch_strides = (data_strides[0], data_strides[0], data_strides[1])

    return as_strided(data_slice, batch_shape, batch_strides, writeable=False)



# Test

data = np.arange(300).reshape((100, 3))

batch = get_time_series(data, 20, 5, 4)

print(batch)

Output:

[[[45 46 47]

  [48 49 50]

  [51 52 53]

  [54 55 56]

  [57 58 59]]



 [[48 49 50]

  [51 52 53]

  [54 55 56]

  [57 58 59]

  [60 61 62]]



 [[51 52 53]

  [54 55 56]

  [57 58 59]

  [60 61 62]

  [63 64 65]]



 [[54 55 56]

  [57 58 59]

  [60 61 62]

  [63 64 65]

  [66 67 68]]]

edited Nov 13 at 10:49

answered Nov 12 at 12:03

jdehesa

22.2k43150

Hi jdehesa, thanks for your answer. I forgot to mention that I want to select the datasets randomly and therefore no sliding window is possible. If I use your snipped and change the index calculation, it's not faster anymore.
– Schaefma3
Nov 12 at 12:41

@Schaefma3 Would it be possible to shuffle first and then take the sliding window?
– jdehesa
Nov 12 at 12:43

No, because the LSTM cell learns from the past samples (rows) and makes a prediction for the next time step. Therefore, the order of the inputs must also be in the actual time sequence.
– Schaefma3
Nov 12 at 12:48

@Schaefma3 Ah I see what you mean, you want to shuffle the examples, of course. I have added one possibility for that, check out if that could work for you (note I made a small fix in get_time_series).
– jdehesa
Nov 12 at 12:56

This works, but the runtime is only reduced by half. The problem is that indexing in a 3d numpy is extremely slow compared to a 2d array. The fact that I now have a 3d array of pointers therefore does not lead to much change (except less memory). Do you have another idea to increase performance significantly? Thanks for your efforts!
– Schaefma3
Nov 13 at 8:57

|
show 1 more comment

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53260857%2fslow-numpy-array-indexing-for-keras-time-series-generator%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

EDIT 2:

Not sure if this is going to be any faster, but you can also just do something like this. It still relies on advanced indexing, although over contiguous data, so maybe it's a bit better?:

import numpy as np



def get_time_series(data, indices, look_back):

    # Make sure indices are big enough

    indices = indices[indices >= look_back]

    # Make indexing matrix

    idx = indices[:, np.newaxis] + np.arange(-look_back, 0)

    # Make batch

    return data[idx]

You would use it for example like this:

import numpy as np



def get_time_series(data, indices, look_back):

    indices = indices[indices >= look_back]

    idx = indices[:, np.newaxis] + np.arange(-look_back, 0)

    return data[idx]



def make_batches(data, look_back, batch_size):

    indices = np.random.permutation(np.arange(look_back, len(data) + 1))

    for i in range(0, len(indices), batch_size):

        yield get_time_series(data, indices[i:i + batch_size], look_back)



data = ...

look_back = ...

batch_size = ...

for batch in make_batches(data, look_back, batch_size):

    # Use batch

EDIT:

If you want to shuffle the examples, you could first make the sliding window for the whole dataset (which should not take any memory or time) and then take batches from a shuffled index:

# Make sliding window with the previous function

data_sw = get_time_series(data, 0, look_back, len(data))

# Random index

batch_idx = np.random.permutation(len(data_sw))

# To get the first batch

batch = data_sw[batch_idx[:batch_size]]

I think this does what you want, and should be quite faster than using loops:

import numpy as np



def get_time_series(data, index, look_back, batch_size):

    from numpy.lib.stride_tricks import as_strided

    # Index should be at least as big as look_back to have enough elements before it

    index = max(index, look_back)

    # Batch size should not go beyond the array

    batch_size = min(batch_size, len(data) - index + 1)

    # Relevant slice for the batch

    data_slice = data[index - look_back:index + batch_size]

    # Reshape with stride tricks as a "sliding window"

    data_strides = data_slice.strides

    batch_shape = (batch_size, look_back, data_slice.shape[-1])

    batch_strides = (data_strides[0], data_strides[0], data_strides[1])

    return as_strided(data_slice, batch_shape, batch_strides, writeable=False)



# Test

data = np.arange(300).reshape((100, 3))

batch = get_time_series(data, 20, 5, 4)

print(batch)

Output:

[[[45 46 47]

  [48 49 50]

  [51 52 53]

  [54 55 56]

  [57 58 59]]



 [[48 49 50]

  [51 52 53]

  [54 55 56]

  [57 58 59]

  [60 61 62]]



 [[51 52 53]

  [54 55 56]

  [57 58 59]

  [60 61 62]

  [63 64 65]]



 [[54 55 56]

  [57 58 59]

  [60 61 62]

  [63 64 65]

  [66 67 68]]]

edited Nov 13 at 10:49

answered Nov 12 at 12:03

jdehesa

22.2k43150

Hi jdehesa, thanks for your answer. I forgot to mention that I want to select the datasets randomly and therefore no sliding window is possible. If I use your snipped and change the index calculation, it's not faster anymore.
– Schaefma3
Nov 12 at 12:41

@Schaefma3 Would it be possible to shuffle first and then take the sliding window?
– jdehesa
Nov 12 at 12:43

No, because the LSTM cell learns from the past samples (rows) and makes a prediction for the next time step. Therefore, the order of the inputs must also be in the actual time sequence.
– Schaefma3
Nov 12 at 12:48

@Schaefma3 Ah I see what you mean, you want to shuffle the examples, of course. I have added one possibility for that, check out if that could work for you (note I made a small fix in get_time_series).
– jdehesa
Nov 12 at 12:56

This works, but the runtime is only reduced by half. The problem is that indexing in a 3d numpy is extremely slow compared to a 2d array. The fact that I now have a 3d array of pointers therefore does not lead to much change (except less memory). Do you have another idea to increase performance significantly? Thanks for your efforts!
– Schaefma3
Nov 13 at 8:57

|
show 1 more comment

EDIT 2:

Not sure if this is going to be any faster, but you can also just do something like this. It still relies on advanced indexing, although over contiguous data, so maybe it's a bit better?:

import numpy as np



def get_time_series(data, indices, look_back):

    # Make sure indices are big enough

    indices = indices[indices >= look_back]

    # Make indexing matrix

    idx = indices[:, np.newaxis] + np.arange(-look_back, 0)

    # Make batch

    return data[idx]

You would use it for example like this:

import numpy as np



def get_time_series(data, indices, look_back):

    indices = indices[indices >= look_back]

    idx = indices[:, np.newaxis] + np.arange(-look_back, 0)

    return data[idx]



def make_batches(data, look_back, batch_size):

    indices = np.random.permutation(np.arange(look_back, len(data) + 1))

    for i in range(0, len(indices), batch_size):

        yield get_time_series(data, indices[i:i + batch_size], look_back)



data = ...

look_back = ...

batch_size = ...

for batch in make_batches(data, look_back, batch_size):

    # Use batch

EDIT:

If you want to shuffle the examples, you could first make the sliding window for the whole dataset (which should not take any memory or time) and then take batches from a shuffled index:

# Make sliding window with the previous function

data_sw = get_time_series(data, 0, look_back, len(data))

# Random index

batch_idx = np.random.permutation(len(data_sw))

# To get the first batch

batch = data_sw[batch_idx[:batch_size]]

I think this does what you want, and should be quite faster than using loops:

import numpy as np



def get_time_series(data, index, look_back, batch_size):

    from numpy.lib.stride_tricks import as_strided

    # Index should be at least as big as look_back to have enough elements before it

    index = max(index, look_back)

    # Batch size should not go beyond the array

    batch_size = min(batch_size, len(data) - index + 1)

    # Relevant slice for the batch

    data_slice = data[index - look_back:index + batch_size]

    # Reshape with stride tricks as a "sliding window"

    data_strides = data_slice.strides

    batch_shape = (batch_size, look_back, data_slice.shape[-1])

    batch_strides = (data_strides[0], data_strides[0], data_strides[1])

    return as_strided(data_slice, batch_shape, batch_strides, writeable=False)



# Test

data = np.arange(300).reshape((100, 3))

batch = get_time_series(data, 20, 5, 4)

print(batch)

Output:

[[[45 46 47]

  [48 49 50]

  [51 52 53]

  [54 55 56]

  [57 58 59]]



 [[48 49 50]

  [51 52 53]

  [54 55 56]

  [57 58 59]

  [60 61 62]]



 [[51 52 53]

  [54 55 56]

  [57 58 59]

  [60 61 62]

  [63 64 65]]



 [[54 55 56]

  [57 58 59]

  [60 61 62]

  [63 64 65]

  [66 67 68]]]

edited Nov 13 at 10:49

answered Nov 12 at 12:03

jdehesa

22.2k43150

Hi jdehesa, thanks for your answer. I forgot to mention that I want to select the datasets randomly and therefore no sliding window is possible. If I use your snipped and change the index calculation, it's not faster anymore.
– Schaefma3
Nov 12 at 12:41

@Schaefma3 Would it be possible to shuffle first and then take the sliding window?
– jdehesa
Nov 12 at 12:43

No, because the LSTM cell learns from the past samples (rows) and makes a prediction for the next time step. Therefore, the order of the inputs must also be in the actual time sequence.
– Schaefma3
Nov 12 at 12:48

@Schaefma3 Ah I see what you mean, you want to shuffle the examples, of course. I have added one possibility for that, check out if that could work for you (note I made a small fix in get_time_series).
– jdehesa
Nov 12 at 12:56

This works, but the runtime is only reduced by half. The problem is that indexing in a 3d numpy is extremely slow compared to a 2d array. The fact that I now have a 3d array of pointers therefore does not lead to much change (except less memory). Do you have another idea to increase performance significantly? Thanks for your efforts!
– Schaefma3
Nov 13 at 8:57

|
show 1 more comment

EDIT 2:

Not sure if this is going to be any faster, but you can also just do something like this. It still relies on advanced indexing, although over contiguous data, so maybe it's a bit better?:

import numpy as np



def get_time_series(data, indices, look_back):

    # Make sure indices are big enough

    indices = indices[indices >= look_back]

    # Make indexing matrix

    idx = indices[:, np.newaxis] + np.arange(-look_back, 0)

    # Make batch

    return data[idx]

You would use it for example like this:

import numpy as np



def get_time_series(data, indices, look_back):

    indices = indices[indices >= look_back]

    idx = indices[:, np.newaxis] + np.arange(-look_back, 0)

    return data[idx]



def make_batches(data, look_back, batch_size):

    indices = np.random.permutation(np.arange(look_back, len(data) + 1))

    for i in range(0, len(indices), batch_size):

        yield get_time_series(data, indices[i:i + batch_size], look_back)



data = ...

look_back = ...

batch_size = ...

for batch in make_batches(data, look_back, batch_size):

    # Use batch

EDIT:

If you want to shuffle the examples, you could first make the sliding window for the whole dataset (which should not take any memory or time) and then take batches from a shuffled index:

# Make sliding window with the previous function

data_sw = get_time_series(data, 0, look_back, len(data))

# Random index

batch_idx = np.random.permutation(len(data_sw))

# To get the first batch

batch = data_sw[batch_idx[:batch_size]]

I think this does what you want, and should be quite faster than using loops:

import numpy as np



def get_time_series(data, index, look_back, batch_size):

    from numpy.lib.stride_tricks import as_strided

    # Index should be at least as big as look_back to have enough elements before it

    index = max(index, look_back)

    # Batch size should not go beyond the array

    batch_size = min(batch_size, len(data) - index + 1)

    # Relevant slice for the batch

    data_slice = data[index - look_back:index + batch_size]

    # Reshape with stride tricks as a "sliding window"

    data_strides = data_slice.strides

    batch_shape = (batch_size, look_back, data_slice.shape[-1])

    batch_strides = (data_strides[0], data_strides[0], data_strides[1])

    return as_strided(data_slice, batch_shape, batch_strides, writeable=False)



# Test

data = np.arange(300).reshape((100, 3))

batch = get_time_series(data, 20, 5, 4)

print(batch)

Output:

[[[45 46 47]

  [48 49 50]

  [51 52 53]

  [54 55 56]

  [57 58 59]]



 [[48 49 50]

  [51 52 53]

  [54 55 56]

  [57 58 59]

  [60 61 62]]



 [[51 52 53]

  [54 55 56]

  [57 58 59]

  [60 61 62]

  [63 64 65]]



 [[54 55 56]

  [57 58 59]

  [60 61 62]

  [63 64 65]

  [66 67 68]]]

edited Nov 13 at 10:49

answered Nov 12 at 12:03

jdehesa

22.2k43150

EDIT 2:

Not sure if this is going to be any faster, but you can also just do something like this. It still relies on advanced indexing, although over contiguous data, so maybe it's a bit better?:

import numpy as np



def get_time_series(data, indices, look_back):

    # Make sure indices are big enough

    indices = indices[indices >= look_back]

    # Make indexing matrix

    idx = indices[:, np.newaxis] + np.arange(-look_back, 0)

    # Make batch

    return data[idx]

You would use it for example like this:

import numpy as np



def get_time_series(data, indices, look_back):

    indices = indices[indices >= look_back]

    idx = indices[:, np.newaxis] + np.arange(-look_back, 0)

    return data[idx]



def make_batches(data, look_back, batch_size):

    indices = np.random.permutation(np.arange(look_back, len(data) + 1))

    for i in range(0, len(indices), batch_size):

        yield get_time_series(data, indices[i:i + batch_size], look_back)



data = ...

look_back = ...

batch_size = ...

for batch in make_batches(data, look_back, batch_size):

    # Use batch

EDIT:

If you want to shuffle the examples, you could first make the sliding window for the whole dataset (which should not take any memory or time) and then take batches from a shuffled index:

# Make sliding window with the previous function

data_sw = get_time_series(data, 0, look_back, len(data))

# Random index

batch_idx = np.random.permutation(len(data_sw))

# To get the first batch

batch = data_sw[batch_idx[:batch_size]]

I think this does what you want, and should be quite faster than using loops:

import numpy as np



def get_time_series(data, index, look_back, batch_size):

    from numpy.lib.stride_tricks import as_strided

    # Index should be at least as big as look_back to have enough elements before it

    index = max(index, look_back)

    # Batch size should not go beyond the array

    batch_size = min(batch_size, len(data) - index + 1)

    # Relevant slice for the batch

    data_slice = data[index - look_back:index + batch_size]

    # Reshape with stride tricks as a "sliding window"

    data_strides = data_slice.strides

    batch_shape = (batch_size, look_back, data_slice.shape[-1])

    batch_strides = (data_strides[0], data_strides[0], data_strides[1])

    return as_strided(data_slice, batch_shape, batch_strides, writeable=False)



# Test

data = np.arange(300).reshape((100, 3))

batch = get_time_series(data, 20, 5, 4)

print(batch)

Output:

[[[45 46 47]

  [48 49 50]

  [51 52 53]

  [54 55 56]

  [57 58 59]]



 [[48 49 50]

  [51 52 53]

  [54 55 56]

  [57 58 59]

  [60 61 62]]



 [[51 52 53]

  [54 55 56]

  [57 58 59]

  [60 61 62]

  [63 64 65]]



 [[54 55 56]

  [57 58 59]

  [60 61 62]

  [63 64 65]

  [66 67 68]]]

edited Nov 13 at 10:49

answered Nov 12 at 12:03

jdehesa

22.2k43150

edited Nov 13 at 10:49

answered Nov 12 at 12:03

jdehesa

22.2k43150

answered Nov 12 at 12:03

jdehesa

22.2k43150

answered Nov 12 at 12:03

jdehesa

22.2k43150

Hi jdehesa, thanks for your answer. I forgot to mention that I want to select the datasets randomly and therefore no sliding window is possible. If I use your snipped and change the index calculation, it's not faster anymore.
– Schaefma3
Nov 12 at 12:41

@Schaefma3 Would it be possible to shuffle first and then take the sliding window?
– jdehesa
Nov 12 at 12:43

No, because the LSTM cell learns from the past samples (rows) and makes a prediction for the next time step. Therefore, the order of the inputs must also be in the actual time sequence.
– Schaefma3
Nov 12 at 12:48

@Schaefma3 Ah I see what you mean, you want to shuffle the examples, of course. I have added one possibility for that, check out if that could work for you (note I made a small fix in get_time_series).
– jdehesa
Nov 12 at 12:56

This works, but the runtime is only reduced by half. The problem is that indexing in a 3d numpy is extremely slow compared to a 2d array. The fact that I now have a 3d array of pointers therefore does not lead to much change (except less memory). Do you have another idea to increase performance significantly? Thanks for your efforts!
– Schaefma3
Nov 13 at 8:57

|
show 1 more comment

Hi jdehesa, thanks for your answer. I forgot to mention that I want to select the datasets randomly and therefore no sliding window is possible. If I use your snipped and change the index calculation, it's not faster anymore.
– Schaefma3
Nov 12 at 12:41

@Schaefma3 Would it be possible to shuffle first and then take the sliding window?
– jdehesa
Nov 12 at 12:43

No, because the LSTM cell learns from the past samples (rows) and makes a prediction for the next time step. Therefore, the order of the inputs must also be in the actual time sequence.
– Schaefma3
Nov 12 at 12:48

@Schaefma3 Ah I see what you mean, you want to shuffle the examples, of course. I have added one possibility for that, check out if that could work for you (note I made a small fix in get_time_series).
– jdehesa
Nov 12 at 12:56

This works, but the runtime is only reduced by half. The problem is that indexing in a 3d numpy is extremely slow compared to a 2d array. The fact that I now have a 3d array of pointers therefore does not lead to much change (except less memory). Do you have another idea to increase performance significantly? Thanks for your efforts!
– Schaefma3
Nov 13 at 8:57

Hi jdehesa, thanks for your answer. I forgot to mention that I want to select the datasets randomly and therefore no sliding window is possible. If I use your snipped and change the index calculation, it's not faster anymore.
– Schaefma3
Nov 12 at 12:41

@Schaefma3 Would it be possible to shuffle first and then take the sliding window?
– jdehesa
Nov 12 at 12:43

No, because the LSTM cell learns from the past samples (rows) and makes a prediction for the next time step. Therefore, the order of the inputs must also be in the actual time sequence.
– Schaefma3
Nov 12 at 12:48

@Schaefma3 Ah I see what you mean, you want to shuffle the examples, of course. I have added one possibility for that, check out if that could work for you (note I made a small fix in get_time_series).
– jdehesa
Nov 12 at 12:56

This works, but the runtime is only reduced by half. The problem is that indexing in a 3d numpy is extremely slow compared to a 2d array. The fact that I now have a 3d array of pointers therefore does not lead to much change (except less memory). Do you have another idea to increase performance significantly? Thanks for your efforts!
– Schaefma3
Nov 13 at 8:57

|
show 1 more comment

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

Some of your past answers have not been well-received, and you're in danger of being blocked from answering.

Please pay close attention to the following guidance:

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Ndtyjky