Skip to content

Input and target format for multidimentional time-series regression #4870

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
bbogart opened this issue Dec 29, 2016 · 36 comments
Closed

Input and target format for multidimentional time-series regression #4870

bbogart opened this issue Dec 29, 2016 · 36 comments

Comments

@bbogart
Copy link

bbogart commented Dec 29, 2016

I'm trying to solve a problem I had intended for tensorflow in Keras.

I've gotten a lot further using Keras, but I'm still unclear on how best to represent my sequence data. The following code works quite well using only one input sample and one target sample:

import numpy as np

from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM

# This does work by using only one sample:
data = [[0,0,0,0,0,0,0,0,0,2,1]]
data = np.array(data, dtype=float)
target = [0,0,0,0,0,0,0,0,2,1,0]
target = np.array(target, dtype=float)

data = data.reshape((1, 1, 11)) # Single batch, 1 time steps, 11 dimentions
target = target.reshape((-1, 11)) # Corresponds to shape (None, 11)


# Build Model
model = Sequential()  
model.add(LSTM(11, input_shape=(1, 11), unroll=True))
model.add(Dense(11))
model.compile(loss='mean_absolute_error', optimizer='adam')
model.fit(data, target, nb_epoch=1000, batch_size=1, verbose=2)

# Do the output values match the target values?
predict = model.predict(data)
print repr(data)
print repr(predict)

But I can't get it to work with multiple samples:

import numpy as np

from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM

# Input sequence
wholeSequence = [[0,0,0,0,0,0,0,0,0,2,1],
                 [0,0,0,0,0,0,0,0,2,1,0],
                 [0,0,0,0,0,0,0,2,1,0,0],
                 [0,0,0,0,0,0,2,1,0,0,0],
                 [0,0,0,0,0,2,1,0,0,0,0],
                 [0,0,0,0,2,1,0,0,0,0,0],
                 [0,0,0,2,1,0,0,0,0,0,0],
                 [0,0,2,1,0,0,0,0,0,0,0],
                 [0,2,1,0,0,0,0,0,0,0,0],
                 [2,1,0,0,0,0,0,0,0,0,0]]

# Preprocess Data: (This does not work)
wholeSequence = np.array(wholeSequence, dtype=float) # Convert to NP array.
data = wholeSequence[:-1] # all but last
target = wholeSequence[1:] # all but first

# This does not work:
# Reshape training data for Keras LSTM model
# The training data needs to be (batchIndex, timeStepIndex, dimentionIndex)
data = data.reshape((1, 9, 11)) # Single batch, 9 time steps, 11 dimentions
target = target.reshape((-1, 11)) # Corresponds to shape (None, 11)


# Build Model
model = Sequential()  
model.add(LSTM(11, input_shape=(9, 11), unroll=True))
model.add(Dense(11))
model.compile(loss='mean_absolute_error', optimizer='adam')
model.fit(data, target, nb_epoch=1000, batch_size=1, verbose=2)

# Do the output values match the target values?
predict = model.predict(data)
print repr(data)
print repr(predict)

Due to this error: ValueError: Input arrays should have the same number of samples as target arrays. Found 1 input samples and 9 target samples.

What am I doing wrong with array shape?

It would be really nice if Keras facilitates this use case such that a single data structure holds the sequence and the fitter would know that for each input X_t, the target is X_(t+1). This would provide some benefits such as the following:

  • There would be no redundancy in storing the data and targets separately.

  • One would not have to be concerned with the shape of the input and targets separately.

@bbogart bbogart changed the title Data and target format for multidimentional regression Input and target format for multidimentional time-series regression Dec 29, 2016
@bstriner
Copy link
Contributor

If you want a prediction for each timestep, set return_sequences=True when creating the LSTM.

If you want to run the same dense model after each timestep after the LSTM, use TimeDistributed(Dense(11)).

Run the below so you can see if your shapes all line up:

model.summary()
print "Inputs: {}".format(model.input_shape)
print "Outputs: {}".format(model.output_shape)
print "Actual input: {}".format(data.shape)
print "Actual output: {}".format(target.shape)

Cheers,
Ben

@bbogart
Copy link
Author

bbogart commented Dec 31, 2016

Thank you for the reply Ben.

First, I'm not sure I'm understanding what a "sample" means in the Keras context. I've been thinking one sample = one time-step, but it seems to mean something else, as a sample can have multiple vectors each of multiple dimensions, whereas I expect only one vector of multiple dimensions for each time-step. Maybe sample actually means batch? (a sampling of input vectors?)

Second, in my single vector example above, I see the following shape outputs from your code above:

Inputs: (None, 1, 11)
Outputs: (None, 11)
Actual input: (1, 1, 11)
Actual output: (1, 11)

I don't know how to reshape to (None,11); the following code does not appear to:

target = target.reshape((-1, 11)) # Corresponds to shape (None, 11)

In my multivector example, I see the same pattern of shapes:

Inputs: (None, 9, 11)
Outputs: (None, 11)
Actual input: (1, 9, 11)
Actual output: (9, 11)

But in this case results in the error above, even though (1,11) != (None, 11) and (9,11) != (None, 11)

I don't understand your suggestions regarding return_sequences or TimeDistributed; I'm not even sure I need the final dense layer at all, but saw it in an example somewhere.

To zoom back to the big picture: Once I get the model to fit the data, I'll want to seed the model with a real data time-step, and then use that prediction (eventually with integer clamping) as the next input to recreate the whole sequence:

prediction = data[0] # first input vector
for step in xrange(steps)
prediction = model.predict(prediction)
print step, prediction

@bstriner
Copy link
Contributor

The first dimension of your data is the batch dimension. It will show up as None. It can be any size, as long as it is the same for your inputs and targets. When you're dealing with LSTMs, the batch dimension is the number of sequences, not the length of the sequence.

LSTMs in Keras are typically used on 3d data (batch dimension, timesteps, features).
LSTM without return_sequences will output (batch dimension, output features)
LSTM with return_sequences will output (batch dimension, timesteps, output features)

So, if your input shape is (None, 9, 11) and your actual input shape is (1, 9, 11) that means your batch dimension is 1. If your output shape is (None, 11), then your actual targets need to be (1,11).

The loop you're describing isn't the way to do things in Keras. Run a single vector with shape (number of sequences, steps, features) to calculate the entire series in one go. That way errors can backpropagate through time.

Are you trying to do time-series prediction? I'm not sure what you're trying to build.

Basic timeseries data has an input shape (number of sequences, steps, features). Target is (number of sequences, steps, targets). Use an LSTM with return_sequences.

Probably skip the dense layer for now until you have the basics working.

Cheers,
Ben

@bbogart
Copy link
Author

bbogart commented Jan 3, 2017

Following is my working code (with very low loss). Thanks for the help. I still think a fit function for large sequences that does auto random shuffled batches would be a nice feature, rather than storing redundant information.

import numpy as np

from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM

# Input sequence
wholeSequence = [[0,0,0,0,0,0,0,0,0,2,1],
                 [0,0,0,0,0,0,0,0,2,1,0],
                 [0,0,0,0,0,0,0,2,1,0,0],
                 [0,0,0,0,0,0,2,1,0,0,0],
                 [0,0,0,0,0,2,1,0,0,0,0],
                 [0,0,0,0,2,1,0,0,0,0,0],
                 [0,0,0,2,1,0,0,0,0,0,0],
                 [0,0,2,1,0,0,0,0,0,0,0],
                 [0,2,1,0,0,0,0,0,0,0,0],
                 [2,1,0,0,0,0,0,0,0,0,0]]

# Preprocess Data:
wholeSequence = np.array(wholeSequence, dtype=float) # Convert to NP array.
data = wholeSequence[:-1] # all but last
target = wholeSequence[1:] # all but first

# Reshape training data for Keras LSTM model
# The training data needs to be (batchIndex, timeStepIndex, dimentionIndex)
# Single batch, 9 time steps, 11 dimentions
data = data.reshape((1, 9, 11))
target = target.reshape((1, 9, 11))

# Build Model
model = Sequential()  
model.add(LSTM(11, input_shape=(9, 11), unroll=True, return_sequences=True))
model.add(Dense(11))
model.compile(loss='mean_absolute_error', optimizer='adam')
model.fit(data, target, nb_epoch=2000, batch_size=1, verbose=2)

@bbogart
Copy link
Author

bbogart commented Jan 4, 2017

Following is my working code (with very low loss). Thanks for the help. I still think a fit function for large sequences that does auto random shuffled batches would be a nice feature, rather than storing redundant information.

import numpy as np

from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM

# Input sequence
wholeSequence = [[0,0,0,0,0,0,0,0,0,2,1],
                 [0,0,0,0,0,0,0,0,2,1,0],
                 [0,0,0,0,0,0,0,2,1,0,0],
                 [0,0,0,0,0,0,2,1,0,0,0],
                 [0,0,0,0,0,2,1,0,0,0,0],
                 [0,0,0,0,2,1,0,0,0,0,0],
                 [0,0,0,2,1,0,0,0,0,0,0],
                 [0,0,2,1,0,0,0,0,0,0,0],
                 [0,2,1,0,0,0,0,0,0,0,0],
                 [2,1,0,0,0,0,0,0,0,0,0]]

# Preprocess Data:
wholeSequence = np.array(wholeSequence, dtype=float) # Convert to NP array.
data = wholeSequence[:-1] # all but last
target = wholeSequence[1:] # all but first

# Reshape training data for Keras LSTM model
# The training data needs to be (batchIndex, timeStepIndex, dimentionIndex)
# Single batch, 9 time steps, 11 dimentions
data = data.reshape((1, 9, 11))
target = target.reshape((1, 9, 11))

# Build Model
model = Sequential()  
model.add(LSTM(11, input_shape=(9, 11), unroll=True, return_sequences=True))
model.add(Dense(11))
model.compile(loss='mean_absolute_error', optimizer='adam')
model.fit(data, target, nb_epoch=2000, batch_size=1, verbose=2)

I still think it would be nice if a Sequential.fit() function was provided that manages input/target pairs in the when the length and shape of input and output are the same, as described in my initial post.

@stale
Copy link

stale bot commented May 23, 2017

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs, but feel free to re-open it if needed.

@galfaroi
Copy link

galfaroi commented Jun 18, 2017

What if you have a train and test set that is something like:
x_train= x_train.reshape(1, 1870, 14)
y_train= y_train.reshape(1, 1870, 14)
That's my model.fit(x_train, y_train, , epochs=10, batch_size=1, verbose=2)
But my test data is:
x_test= x_test.reshape(1, 200, 14)
y_test= y_test.reshape(1, 200, 14)

It works for training. But it doesnt validate since the length of my test series is 200 it doesn't work anymore since it is expecting the longer time series of 1870 ..

Thanks, Ill appreciate the help

@stale stale bot removed the stale label Jun 18, 2017
@td2014
Copy link
Contributor

td2014 commented Jun 18, 2017

Can you please clarify what you are trying to do with your model? Thanks.

@bstriner
Copy link
Contributor

@galfaroi Ideally you should break out overlapping windows. Instead of one sequence of 1870, you could have many sequences of let's say 20. Your sequences should be overlapping windows [0-20], [1-21], [2-22], etc, so your final shape would be something like (1850, 20, 14).

Same process for your test data. Break into subsequences of the same length as training.

You will have to play around with finding what a good subsequence length is.

It is extremely important to have many different ways of slicing your data. If you train on just one super long sequence it will probably not learn anything interesting.

Also, depending on what you decide to do, it may be better to generate subsequences as part of a generator function instead of doing it ahead of time. Check the keras docs for how to write a generator.

@ss32
Copy link

ss32 commented Jul 9, 2017

I came across this searching for a solution to the same problem, but I'm still not 100% sure how to solve my issue. My input array consists of N timesteps of 2 samples with an embedded lag of 22 timesteps. My target is a scalar of N timesteps. When I try to fit I get an error complaining about 2 input samples and 1 target sample, but isn't LSTM used for scalar outputs? How do I go from multi-dimensional inputs to a scalar output?

ValueError: Input arrays should have the same number of samples as target arrays. Found 2 input samples and 1 target samples.


Using TensorFlow backend.
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
lstm_1 (LSTM)                (None, 20000, 2)          200       
_________________________________________________________________
dense_1 (Dense)              (None, 20000, 1)          3         
=================================================================
Total params: 203.0
Trainable params: 203
Non-trainable params: 0.0
_________________________________________________________________
Inputs: (None, 20000, 22)
Outputs: (None, 20000, 1)
Actual input: (2, 20000, 22)
Actual output: (1, 20000, 1)

@bstriner
Copy link
Contributor

bstriner commented Jul 9, 2017

@ss32 dimensions are (samples, timesteps, features). "Samples" are data that goes together so it doesn't make any sense when you say you have 2 input samples and 1 output sample. Maybe you are saying you have 1 input sample (with 44 features for each timestep). Maybe you are saying you have 2 input samples and 2 output samples, but the 2 output samples are the same.

The data you're describing is impossible, so please explain what you are trying to do with 2 input samples and 1 output sample. What is your data supposed to represent?

Number of input and output samples have to be the same because a "sample" is an input and output combination.

You might also just be confusing samples and features. 2 input features and 1 output feature is just input (1,2000,2) output (1,2000,1).

On a related note, please do not try to pass a 2000 length sequence to your LSTM. It will give you junk (unless you're just looking to make an art project or something). Best strategy is to slice out many subsequences of maybe length 40 (something bigger than your expected time lag) so your shapes are (None, 40, 2) and (None, 40, 1). If you have very many possible different subsequences you will learn a model that works marginalized over all possible positions in the sequence. LSTM will get slow after maybe 10 steps, so 2000 is kind of silly, especially if you don't think you have relationships with 1999 lag. Also, if you only have 1 sample, nothing will generalize. You can break that 1 sample into 1960 subsequences of length 40 and you might learn something more meaningful.

Cheers

@ss32
Copy link

ss32 commented Jul 9, 2017

@bstriner

a "sample" is an input and output combination.

That helps. I was under the impression that a sample would be a reading of the input at some time.

Let's call my input A and my output B. I'm trying to predict B given an input of time and A. A and B are strongly correlated, and B has some pseudo-periodic behavior over long enough time scales, so I would like to predict B for some A at a given time. I have found that I can predict B when looking back at the past ~22 values for A, hence the (2,22,N) array.

On a related note, please do not try to pass a 2000 length sequence to your LSTM. It will give you junk (unless you're just looking to make an art project or something).

Maybe LSTM is the wrong layer to use then. I have the model working as an MLP in Matlab and would like to get it implemented in Keras/Tensorflow. Regardless, the type of layer doesn't solve my problem of wanting to use a multi-dimensional inputl; I have tried using all dense layers as well. I found a similar issue here: #1904 and tried implementing the solution but still ran into the same problem.

@bstriner
Copy link
Contributor

bstriner commented Jul 9, 2017

@ss32 LSTM is the right layer to use. The problem is that LSTM learns an initial state. If it only ever starts in the same position, it will learn something that might only work from the same starting position. If you train it on different subsequences from different starting positions then it will try to learn something that will work starting from anywhere. This doesn't require any changes to your keras model, but changes to your input and output shaping and preprocessing.

You have a few choices. Easiest option is to flatten everything. If the inputs are (n,depth,3,5) and (n,depth,6), reshape and concatentate into (n,depth,21) then use an LSTM as usual.

I don't see where you are running into problems. Please explain what your data is so we can decide what shape it is supposed to be instead of the other way around.

Standard usage:

  • input sequences are (n, steps, input features)
  • output sequences are (n, steps, output features)
  • Run LSTM with return_sequences=True

If you have only one long sequence instead of short sequences, try breaking them up into reasonably sized chunks.

Cheers

@ss32
Copy link

ss32 commented Jul 10, 2017

@bstriner

The problem is that LSTM learns an initial state. If it only ever starts in the same position, it will learn something that might only work from the same starting position. If you train it on different subsequences from different starting positions then it will try to learn something that will work starting from anywhere. This doesn't require any changes to your keras model, but changes to your input and output shaping and preprocessing.

You have a few choices. Easiest option is to flatten everything. If the inputs are (n,depth,3,5) and (n,depth,6), reshape and concatentate into (n,depth,21) then use an LSTM as usual.

I see, so I need to break up my N data points into smaller chunks?

I managed to get it working using SimpleRNN but the fit is awful. Now it's a matter of tuning the model and embedding the time delay again. My input is now (N,2,1) with target (N, ).

Layer (type)                 Output Shape              Param #   
=================================================================
simple_rnn_1 (SimpleRNN)     (None, 1)                 3         
_________________________________________________________________
dense_1 (Dense)              (None, 28)                56        
_________________________________________________________________
dense_2 (Dense)              (None, 1)                 29        
=================================================================
Total params: 88.0
Trainable params: 88
Non-trainable params: 0.0
_________________________________________________________________
Inputs: (None, None, 1)
Outputs: (None, 1)
Actual input: (60000, 2, 1)
Actual output: (60000,)

I'm predicting the solar index, Kp, based on the date and Z component of the Earth's magnetic field, and the data is "continuous" in that it's a giant sequence of readings going back to the 60's. I've found through a Matlab NARXnet that embedding a time delay of 22 timesteps results in the most accurate predictions, so I'm trying to implement the same network using Keras/TF for validation and it should be less expensive, computationally.

Matlab uses a recurrent network for time series data. x(t) is my input of [date, bZ] and y(t) is the Kp readings, my targets. The activation function for the hidden layer is basically a sigmoid (Matlab calls it a tansig), and then a linear activation on the output layer, so this is what I've built into my Keras model.
Matlab NARXnet

Hopefully that clears things up. Thanks for your help.

edit: One last comment that might clear up the embedding delay - For each input x(t) I need an array of [x(t) x(t-1) x(t-2)...x(t-21)] where x is itself 2 dimensional [date,bZ], so the full set of inputs would be an array of size (22,2,N) for N time steps, and I want to predict the corresponding y(t) for each time step t.

@Tchaikovic
Copy link

@bstriner do you have an example code that works with the dimensions (samples, timesteps, features) you described?

@bstriner
Copy link
Contributor

@Tchaikovic did you have a dataset or a problem in mind? All LSTM in Keras are (samples, timesteps, features), so any LSTM is an example. The important thing when using them is to understand your dataset so you know what is what and do any necessary preprocessing or reshaping to make the data the right shape.

So, for example, lets say you want to train a language model on 6-word sequences, and you have a vector of words of length n. There are k unique words. First, one-hot encode the words (n,) -> (n, k). Then, roll and concatenate the array to itself (n,k) -> (n-5, 6, k). This array has every 6-word sequence in the data. Each word is encoded as k features (one-hot).

@vgupta13
Copy link

@bstriner , I am working on timeseries prediction ( freq DAILY basis)... I am confused with the samples, timesteps, features concept... Lets assume I have 12 features into consideration to predict my output value... and I have 1 year of data ie 365 rows... should I reshape my data as (365,365,12)... should it be interpreted as 1 row or sequence of 12 features represent 1 timestep? appreciated your help in this!

@bstriner
Copy link
Contributor

bstriner commented Sep 1, 2017

LSTM is probably not going to learn that much over 365 timesteps. Also, you may not be looking for patterns that are 365 timesteps away.

You can shape your data (1,365,12) and run it in one go. The problem is that will probably not generalize or be very meaningful. You are learning a single function that predicts a single batch of data.

Ideally, decide something like 2 weeks is a reasonable amount of data to make a prediction. Reshape your data as (341, 14, 12). That is each 2-week subsequence. There will be repetition. However, you are now learning a function that works on every 2-week subsequence, so the function is more likely to generalize to other data.

Since this involves repetition, it may be more efficient to generate data on the fly using fit_generator. Randomly sample a bunch of starting points, then select those subsequences.

@vgupta13
Copy link

vgupta13 commented Sep 1, 2017

@bstriner Thanks for the explanation :)... I was looking exactly for this core thing... however, could you please tell why you have taken 341? My actual problem is to learn from past 10 years data and make prediction for 4 weeks window.... however, currently I have data of 1.5 year only... so trying to get intuition on how model works... should I frame like (365, 28, 12) ? model will learn 4 weeks sequences pattern?

@bstriner
Copy link
Contributor

bstriner commented Sep 1, 2017

If you have 5 days of data and look at 3 day windows, there are 3 windows (5-3+1). If you have 365 days and 14 day windows there should be 352 possible sequences. ignore the 341.

You want to make a 1 month window prediction based on what? The previous 11 months? Then your input should be 365 days of data. If you have 10 years of data that is 36510 possible sequences. So (36510, 365, 12).

If days of the week or which month is part of your model, then you can change things up. The point is that the first dimension is how many sequences, and you want as many as possible so the thing will generalize meaningfully.

As an example of what not to do, make a model that is (1, 365*10, 12). That will learn a function that outputs all 10 years of data, once. The problem is, it will only work on that 10 years of data. If you give it just 9 years of data it might give you something else entirely, because it has only seen one sequence. It might work well on training data but might be junk for other data, because there is in effect only one piece of training data.

So, get creative, and however you can slice things up to make multiple sequences should make a better model.

@nbucklin
Copy link

nbucklin commented Sep 7, 2017

Hi @bstriner, hope you don't mind another question! Earlier in this thread there was a discussion about the dimensions of the input and target arrays. My input array is (1500,50,2) for 1500 samples, 100 timesteps, and 2 features. My target array is just (1500), which consists of binary values (what I'm trying to predict). So far the model I've designed worked. But I was wondering if this makes sense how it's set up.

@bstriner
Copy link
Contributor

bstriner commented Sep 9, 2017

@nbucklin "worked" is hard to define. You can normally get a model to run but the question is does it do anything meaningful. Do you have any validation data? From your description, you have 1500 independent sequences, each sequence is 100 timesteps, and each sequence predicts a single value. You are never going to use your model on sequences other than 100 timesteps.

If that is correct, you shouldn't run into the types of issues discussed here. The generalization issues discussed here arise when you have one very long sequence and don't break it up into subsequences. You also run into generalization issues when you want to use the model on sequences that are different sizes than what you trained it on.

If all 1500 are independent sequences, you should set aside a handful as validation.

One minor note: if you don't think the model needs all 100 timesteps to make a decision, you can use return_sequences=True. That way the LSTM will output a guess at each timestep and might get it correct before the 100th timestep. In that case, your targets would be (1500,100).

Cheers

@nbucklin
Copy link

@bstriner Thanks for the advice! That all makes sense to me. Appreciate your continued help in this thread.

@cloudynet1
Copy link

Hi bstriner,
My data has n packets (captured from network traffic), each packet has many features f (one of them is time), example (from train_x(4000,41)):
f1 f2 f3 …
pkt1 2 3 3
pkt2 1 3 5
pkt3 2 3 2
pkt4 5 3 1
pkt5 5 3 2
….
And label is train_y(4000,1) for every packets.
I want to use LSTM to classify some adjacent packets into normal or abnormal (timesteps are changed arbitrarily). When I set timesteps=2, each subsequence has 2 rows, and slide window size =1. After shape, like these:
[[[2 3 3]
[3 5 1]]
[[3 5 1]
[2 3 2]]
…. ]
Shape of train_x is: (3998,2,41), easy to shape with code:
train_x=np.array([train_x[i:i+timesteps] for i in range(len(train_x)- timesteps)])
Shape train_y must be (3998,1). I code:
(1) train_y=np.array(train_y[:train_y.shape[0]-timesteps])
or:
(2) train_y=np.array(train_y[timesteps:train_y.shape[0]])
which is correct? (1) or (2). Sometime, timesteps is large maybe label is difficult to get exactly.
You see my all processes above whether wrong or correct?
One more question. Sometime, I see data is split separately, like this:
[[[2 3 3]
[3 5 1]]
[[2 3 2]
[5 3 1]]
… ]
When do we split separately?
Thank you.

@alimpolat
Copy link

Hello Everyone,
I am new to keras and I have a question to ask.
I have a traffic prediction problem that might use LSTM to solve.
Basically I have a data that road segments as rows and speed profile of segments as columns and there are 288 columns and each column represent 5 minutes interval speed profile of the segment.
One data represents one day speed profile and I have daily speed profile data for the last 3 months.

The idea is to build keras LSTM model based on these 3 months data and use the model for speed profile prediction. We have real time speed profile updated every 5 minutes and the aim is to use on that day's speed profile update to predict 12 steps ahead speed profile; take in every 5 minutes data and predict the next 12 steps and it needs to continue like that.

Best Regards,
Alim

@ghost
Copy link

ghost commented Oct 25, 2017

Hi All,

A fundamental question from my end, I have an input 48672 x 7 which I broke up into overlapping sequences of 96 time steps. So the input to my LSTM is (48577, 96 , 1) and target is (48577, 96, 1), hence I'm trying to predict the next 96 time-steps using a LSTM network.

My first Keras model is
model = Sequential()
model.add(LSTM(24, return_sequences=True, input_shape=(96, 7), implementation=2, unroll=True))
model.add(Dropout(0.5))
model.add(LSTM(12, return_sequences=True, implementation=2, unroll=True))
model.add(Dropout(0.5))
model.add(LSTM(8, return_sequences=True, implementation=2, unroll=True))
model.add(Dropout(0.5))
model.add(TimeDistributed(Dense(1, activation='sigmoid')))

and second model is
model.add(LSTM(96, return_sequences=True, input_shape=(96, 7), implementation=2, unroll=True))
model.add(Dropout(0.5))
model.add(LSTM(48, return_sequences=True, implementation=2, unroll=True))
model.add(Dropout(0.5))
model.add(LSTM(8, return_sequences=True, implementation=2, unroll=True))
model.add(Dropout(0.5))
model.add(TimeDistributed(Dense(1, activation='sigmoid')))

Both models compile and run !

So my questions are:

  1. How is the model able to unroll over 96 time-steps using 24 LSTM cells? Isn't unroll length = sequence length?

  2. Through the TimeDistributed(Dense()) I am making a weight update for each one of the 96 time-steps, am I correct?

  3. Does Keras feed each time-step as a 7-D vector to a LSTM cell, with unroll length = 96 this would correspond to the horizontal LSTM stack of 96 cells capturing the transitions between the time-steps through the states flowing through the layer, right?

Any thoughts? @bstriner @fchollet

@bstriner
Copy link
Contributor

In your description I think you meant input is (48577, 96 , 7).

  1. 24 units means 24 dimensions in the hidden vector and output vectors in each timestep. Units and steps are unrelated.
  2. You are applying a single matrix over each time step. The output of your lstm will be (n,96,24). It will apply a 24x1 linear layer. As a side note, due to the broadcasting rules, you should be able to use dense without the timedistributed and it should still work.
  3. basically
    Another side note: I think time-locked dropout is becoming standard now if you have sequential data.

@bstriner
Copy link
Contributor

No particular reason in your input as to why you need to unroll. I'd be interested in what kind of performance differences you get unrolling v not and w/ different implementation modes.

@ghost
Copy link

ghost commented Oct 27, 2017

Thanks for your comments @bstriner , yes I meant that my input is (48577, 96, 7). Could you please clarify the following points as well?

  1. As I understand, there is one weight matrix per layer for transforming the input of the previous layer into the new dimension given by the number of units (24 or 96 ). This weight matrix is shared by all time-steps of the input to a particular layer.

  2. I tried implementing with just Dense, I got an error saying that the dimensions of the target were wrong while executing the command model.compile

  3. I will check time-locked dropout and will also post the results of my performance differences.

Could you please clarify an additional point,
How are the weights updated? Are they updated to minimize the error over all time-steps of a single sequence? Or are they updated to minimize the error over a batch of sequences, as with classical mini-batch optimization. Also is Backpropogation Through Time (BPTT) the default optimization strategy in keras LSTM's?

@ghost
Copy link

ghost commented Oct 27, 2017

Sorry, I forgot to include a point, unroll creates multiple copies of the LSTM cell for all time-steps. Hence the number of units (24/96) is unrelated to the number of time-steps. Hence enabling/disabling unroll has no relation to the number of time-steps. Am I right?

@gurtej02
Copy link

gurtej02 commented Feb 6, 2018

@bstriner Hello Ben, I would like to ask you a question, hopefully, you can help me with it.

I am working on a dataset which has multiple time series each time series corresponding to a particular location and particular search query at the location. (particular location corresponds to a Designated Marketing Area and search term could be anything like airconditioner, sunglasses etc). You can see it in a way where location, search query, and the timestamp forms a primary key. I have encoded the search query and location using one-hot encoding. The data is appended further with Weather Parameters (snowfall, precipitation, temperature etc) and google trend data which are time series features (daily) for each of the row and for all the time series available. Now reaching this stage, I am little confused that how should I represent my data so that I can logically feed it to LSTM based neural network. I understand Keras take 3 Dimension Tensor as an input (in format samples, timestamp, features). I also understand what it means to represent single time series in that format. But not exactly sure how can I represent that dataset I explained to feed it to the neural network, is the LSTM based model even a good choice to generalize features over multiple time series as input, I would want to predict google trend data give the historical data for multiple timesteps in future. I am thinking of applying encoder-decoder type LSTM network architecture, but first I would like to create a simple model with one LSTM layer followed by a densely connected layer.

Thanks for the help in advance!

@HTLife
Copy link

HTLife commented Feb 7, 2018

@bstriner How can I modify your sample code to use generator to feed data?
I'm trying to feed images into time-distributed CNN + LSTM, but I cannot get the dimentsion right.
The goal is to extract image feature by CNN, then combine 3 feature from 3 images and feed into LSTM.

Input image
(540, 960, 1) ==> (x,y,ch) ==> CNN ==> (m,n,k)┐
(540, 960, 1) ==> (x,y,ch) ==> CNN ==> (m,n,k)---> (3, m,n,k) --flatten--> (3, mnk)
(540, 960, 1) ==> (x,y,ch) ==> CNN ==> (m,n,k)」

(3, mnk) => LSTM => predict three regression value

model = Sequential()
model.add(TimeDistributed(Conv2D(16, (7, 7), padding='same'),input_shape=(None, 540, 960, 1)))
model.add(Activation('relu'))

model.add(TimeDistributed(Conv2D(32, (5, 5), padding='same')))
model.add(Activation('relu'))

model.add(TimeDistributed(Flatten()))
model.add(LSTM(num_classes, return_sequences=True))

model.compile(loss='mean_squared_error', optimizer='adam')

The generator

a = readIMG(filenames[start])  # (540, 960, 1)
b = readIMG(filenames[start + 1])  # (540, 960, 1)
c = readIMG(filenames[start + 2])  # (540, 960, 1)
x_train = np.array([[a, b, c]])  # (1, 3, 540, 960, 1)

ValueError: Error when checking target: expected lstm_1 to have 3 dimensions, but got array with shape (1, 3)

@eromoe
Copy link

eromoe commented Mar 1, 2018

@bstriner Thank you for your effort.

So, for example, lets say you want to train a language model on 6-word sequences, and you have a vector of words of length n. There are k unique words. First, one-hot encode the words (n,) -> (n, k). Then, roll and concatenate the array to itself (n,k) -> (n-5, 6, k). This array has every 6-word sequence in the data. Each word is encoded as k features (one-hot).

I think this is the key point of LSTM input.

But this come up a question to me: What is the proper way to pad a timestep sequences when step length is larger than shortest one ?

For example :
Have one million sentences with two word2vec model to train.
Longest sentence has 100+ chars, shortest only has 4 chars. (it is normal in Chinese )
I want to predict next word by both forward 20 chars and backward 20 chars . So the time_step is 20 .
Many sentences are less than 20.
So ,we need pad all chucks of a sentence to 20 .

Pad Flow:
input : [‘a’, 'b', 'c',‘d’]
preprocess forward : [ '', .. .. ,‘a’] , [ '', .. .. ,‘a’ , ‘b’] [ '', .. .. ,‘a’ , ‘b’, 'c'] , [ '', .. .. ,‘a’ , ‘b’, 'c', ‘d’]]

As you see, a lot of blanks, I think this is not efficient. Is there any dynamic way to do this, or in other word, is there a best practice for variable length input ?

@gaworecki5
Copy link

I do not understand how to format my input matrix like you are describing @bstriner. I have 702 rows of weekly data with 126 features in my training dataset. I would like my lstm to train on data in batches of 12 weeks (quarterly) and from what I am understanding you to say I want an input size of (690, 12, 126) but that is not an allowable size to reshape my original dataset of size (702, 126) into.

Please help. I cannot seem to figure this out!

Thanks so much for the future replies.

@JIDDADAWUDJIDDA
Copy link

how to leverage a GRU to map the input sequence to
a fixed-sized vector. and to treate the GRU
unfolds along with a series of time slices.

@vignesh-j-shetty
Copy link

vignesh-j-shetty commented Oct 22, 2020

@bstriner I have a question If there is Dense layer after LSTM. Will output of that Dense layer depend on all timesteps or only on last timestep?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests