-
Notifications
You must be signed in to change notification settings - Fork 19.6k
Input and target format for multidimentional time-series regression #4870
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
If you want a prediction for each timestep, set return_sequences=True when creating the LSTM. If you want to run the same dense model after each timestep after the LSTM, use TimeDistributed(Dense(11)). Run the below so you can see if your shapes all line up:
Cheers, |
Thank you for the reply Ben. First, I'm not sure I'm understanding what a "sample" means in the Keras context. I've been thinking one sample = one time-step, but it seems to mean something else, as a sample can have multiple vectors each of multiple dimensions, whereas I expect only one vector of multiple dimensions for each time-step. Maybe sample actually means batch? (a sampling of input vectors?) Second, in my single vector example above, I see the following shape outputs from your code above:
I don't know how to reshape to (None,11); the following code does not appear to:
In my multivector example, I see the same pattern of shapes:
But in this case results in the error above, even though (1,11) != (None, 11) and (9,11) != (None, 11) I don't understand your suggestions regarding To zoom back to the big picture: Once I get the model to fit the data, I'll want to seed the model with a real data time-step, and then use that prediction (eventually with integer clamping) as the next input to recreate the whole sequence: prediction = data[0] # first input vector |
The first dimension of your data is the batch dimension. It will show up as LSTMs in Keras are typically used on 3d data (batch dimension, timesteps, features). So, if your input shape is (None, 9, 11) and your actual input shape is (1, 9, 11) that means your batch dimension is 1. If your output shape is (None, 11), then your actual targets need to be (1,11). The loop you're describing isn't the way to do things in Keras. Run a single vector with shape (number of sequences, steps, features) to calculate the entire series in one go. That way errors can backpropagate through time. Are you trying to do time-series prediction? I'm not sure what you're trying to build. Basic timeseries data has an input shape (number of sequences, steps, features). Target is (number of sequences, steps, targets). Use an LSTM with Probably skip the dense layer for now until you have the basics working. Cheers, |
Following is my working code (with very low loss). Thanks for the help. I still think a fit function for large sequences that does auto random shuffled batches would be a nice feature, rather than storing redundant information.
|
Following is my working code (with very low loss). Thanks for the help. I still think a fit function for large sequences that does auto random shuffled batches would be a nice feature, rather than storing redundant information.
I still think it would be nice if a Sequential.fit() function was provided that manages input/target pairs in the when the length and shape of input and output are the same, as described in my initial post. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs, but feel free to re-open it if needed. |
What if you have a train and test set that is something like: It works for training. But it doesnt validate since the length of my test series is 200 it doesn't work anymore since it is expecting the longer time series of 1870 .. Thanks, Ill appreciate the help |
Can you please clarify what you are trying to do with your model? Thanks. |
@galfaroi Ideally you should break out overlapping windows. Instead of one sequence of 1870, you could have many sequences of let's say 20. Your sequences should be overlapping windows [0-20], [1-21], [2-22], etc, so your final shape would be something like (1850, 20, 14). Same process for your test data. Break into subsequences of the same length as training. You will have to play around with finding what a good subsequence length is. It is extremely important to have many different ways of slicing your data. If you train on just one super long sequence it will probably not learn anything interesting. Also, depending on what you decide to do, it may be better to generate subsequences as part of a generator function instead of doing it ahead of time. Check the keras docs for how to write a generator. |
I came across this searching for a solution to the same problem, but I'm still not 100% sure how to solve my issue. My input array consists of N timesteps of 2 samples with an embedded lag of 22 timesteps. My target is a scalar of N timesteps. When I try to fit I get an error complaining about 2 input samples and 1 target sample, but isn't LSTM used for scalar outputs? How do I go from multi-dimensional inputs to a scalar output?
|
@ss32 dimensions are The data you're describing is impossible, so please explain what you are trying to do with 2 input samples and 1 output sample. What is your data supposed to represent? Number of input and output samples have to be the same because a "sample" is an input and output combination. You might also just be confusing samples and features. 2 input features and 1 output feature is just input On a related note, please do not try to pass a 2000 length sequence to your LSTM. It will give you junk (unless you're just looking to make an art project or something). Best strategy is to slice out many subsequences of maybe length 40 (something bigger than your expected time lag) so your shapes are Cheers |
That helps. I was under the impression that a sample would be a reading of the input at some time. Let's call my input A and my output B. I'm trying to predict B given an input of time and A. A and B are strongly correlated, and B has some pseudo-periodic behavior over long enough time scales, so I would like to predict B for some A at a given time. I have found that I can predict B when looking back at the past ~22 values for A, hence the (2,22,N) array.
Maybe LSTM is the wrong layer to use then. I have the model working as an MLP in Matlab and would like to get it implemented in Keras/Tensorflow. Regardless, the type of layer doesn't solve my problem of wanting to use a multi-dimensional inputl; I have tried using all dense layers as well. I found a similar issue here: #1904 and tried implementing the solution but still ran into the same problem. |
@ss32 LSTM is the right layer to use. The problem is that LSTM learns an initial state. If it only ever starts in the same position, it will learn something that might only work from the same starting position. If you train it on different subsequences from different starting positions then it will try to learn something that will work starting from anywhere. This doesn't require any changes to your keras model, but changes to your input and output shaping and preprocessing. You have a few choices. Easiest option is to flatten everything. If the inputs are I don't see where you are running into problems. Please explain what your data is so we can decide what shape it is supposed to be instead of the other way around. Standard usage:
If you have only one long sequence instead of short sequences, try breaking them up into reasonably sized chunks. Cheers |
I see, so I need to break up my N data points into smaller chunks? I managed to get it working using SimpleRNN but the fit is awful. Now it's a matter of tuning the model and embedding the time delay again. My input is now (N,2,1) with target (N, ).
I'm predicting the solar index, Kp, based on the date and Z component of the Earth's magnetic field, and the data is "continuous" in that it's a giant sequence of readings going back to the 60's. I've found through a Matlab NARXnet that embedding a time delay of 22 timesteps results in the most accurate predictions, so I'm trying to implement the same network using Keras/TF for validation and it should be less expensive, computationally. Matlab uses a recurrent network for time series data. x(t) is my input of [date, bZ] and y(t) is the Kp readings, my targets. The activation function for the hidden layer is basically a sigmoid (Matlab calls it a tansig), and then a linear activation on the output layer, so this is what I've built into my Keras model. Hopefully that clears things up. Thanks for your help. edit: One last comment that might clear up the embedding delay - For each input x(t) I need an array of [x(t) x(t-1) x(t-2)...x(t-21)] where x is itself 2 dimensional [date,bZ], so the full set of inputs would be an array of size (22,2,N) for N time steps, and I want to predict the corresponding y(t) for each time step t. |
@bstriner do you have an example code that works with the dimensions (samples, timesteps, features) you described? |
@Tchaikovic did you have a dataset or a problem in mind? All LSTM in Keras are (samples, timesteps, features), so any LSTM is an example. The important thing when using them is to understand your dataset so you know what is what and do any necessary preprocessing or reshaping to make the data the right shape. So, for example, lets say you want to train a language model on 6-word sequences, and you have a vector of words of length n. There are k unique words. First, one-hot encode the words (n,) -> (n, k). Then, roll and concatenate the array to itself (n,k) -> (n-5, 6, k). This array has every 6-word sequence in the data. Each word is encoded as k features (one-hot). |
@bstriner , I am working on timeseries prediction ( freq DAILY basis)... I am confused with the samples, timesteps, features concept... Lets assume I have 12 features into consideration to predict my output value... and I have 1 year of data ie 365 rows... should I reshape my data as (365,365,12)... should it be interpreted as 1 row or sequence of 12 features represent 1 timestep? appreciated your help in this! |
LSTM is probably not going to learn that much over 365 timesteps. Also, you may not be looking for patterns that are 365 timesteps away. You can shape your data (1,365,12) and run it in one go. The problem is that will probably not generalize or be very meaningful. You are learning a single function that predicts a single batch of data. Ideally, decide something like 2 weeks is a reasonable amount of data to make a prediction. Reshape your data as (341, 14, 12). That is each 2-week subsequence. There will be repetition. However, you are now learning a function that works on every 2-week subsequence, so the function is more likely to generalize to other data. Since this involves repetition, it may be more efficient to generate data on the fly using |
@bstriner Thanks for the explanation :)... I was looking exactly for this core thing... however, could you please tell why you have taken 341? My actual problem is to learn from past 10 years data and make prediction for 4 weeks window.... however, currently I have data of 1.5 year only... so trying to get intuition on how model works... should I frame like (365, 28, 12) ? model will learn 4 weeks sequences pattern? |
If you have 5 days of data and look at 3 day windows, there are 3 windows (5-3+1). If you have 365 days and 14 day windows there should be 352 possible sequences. ignore the 341. You want to make a 1 month window prediction based on what? The previous 11 months? Then your input should be 365 days of data. If you have 10 years of data that is 36510 possible sequences. So (36510, 365, 12). If days of the week or which month is part of your model, then you can change things up. The point is that the first dimension is how many sequences, and you want as many as possible so the thing will generalize meaningfully. As an example of what not to do, make a model that is (1, 365*10, 12). That will learn a function that outputs all 10 years of data, once. The problem is, it will only work on that 10 years of data. If you give it just 9 years of data it might give you something else entirely, because it has only seen one sequence. It might work well on training data but might be junk for other data, because there is in effect only one piece of training data. So, get creative, and however you can slice things up to make multiple sequences should make a better model. |
Hi @bstriner, hope you don't mind another question! Earlier in this thread there was a discussion about the dimensions of the input and target arrays. My input array is (1500,50,2) for 1500 samples, 100 timesteps, and 2 features. My target array is just (1500), which consists of binary values (what I'm trying to predict). So far the model I've designed worked. But I was wondering if this makes sense how it's set up. |
@nbucklin "worked" is hard to define. You can normally get a model to run but the question is does it do anything meaningful. Do you have any validation data? From your description, you have 1500 independent sequences, each sequence is 100 timesteps, and each sequence predicts a single value. You are never going to use your model on sequences other than 100 timesteps. If that is correct, you shouldn't run into the types of issues discussed here. The generalization issues discussed here arise when you have one very long sequence and don't break it up into subsequences. You also run into generalization issues when you want to use the model on sequences that are different sizes than what you trained it on. If all 1500 are independent sequences, you should set aside a handful as validation. One minor note: if you don't think the model needs all 100 timesteps to make a decision, you can use Cheers |
@bstriner Thanks for the advice! That all makes sense to me. Appreciate your continued help in this thread. |
Hi bstriner, |
Hello Everyone, The idea is to build keras LSTM model based on these 3 months data and use the model for speed profile prediction. We have real time speed profile updated every 5 minutes and the aim is to use on that day's speed profile update to predict 12 steps ahead speed profile; take in every 5 minutes data and predict the next 12 steps and it needs to continue like that. Best Regards, |
Hi All, A fundamental question from my end, I have an input 48672 x 7 which I broke up into overlapping sequences of 96 time steps. So the input to my LSTM is (48577, 96 , 1) and target is (48577, 96, 1), hence I'm trying to predict the next 96 time-steps using a LSTM network. My first Keras model is and second model is Both models compile and run ! So my questions are:
|
In your description I think you meant input is (48577, 96 , 7).
|
No particular reason in your input as to why you need to unroll. I'd be interested in what kind of performance differences you get unrolling v not and w/ different implementation modes. |
Thanks for your comments @bstriner , yes I meant that my input is (48577, 96, 7). Could you please clarify the following points as well?
Could you please clarify an additional point, |
Sorry, I forgot to include a point, unroll creates multiple copies of the LSTM cell for all time-steps. Hence the number of units (24/96) is unrelated to the number of time-steps. Hence enabling/disabling unroll has no relation to the number of time-steps. Am I right? |
@bstriner Hello Ben, I would like to ask you a question, hopefully, you can help me with it. I am working on a dataset which has multiple time series each time series corresponding to a particular location and particular search query at the location. (particular location corresponds to a Designated Marketing Area and search term could be anything like airconditioner, sunglasses etc). You can see it in a way where location, search query, and the timestamp forms a primary key. I have encoded the search query and location using one-hot encoding. The data is appended further with Weather Parameters (snowfall, precipitation, temperature etc) and google trend data which are time series features (daily) for each of the row and for all the time series available. Now reaching this stage, I am little confused that how should I represent my data so that I can logically feed it to LSTM based neural network. I understand Keras take 3 Dimension Tensor as an input (in format samples, timestamp, features). I also understand what it means to represent single time series in that format. But not exactly sure how can I represent that dataset I explained to feed it to the neural network, is the LSTM based model even a good choice to generalize features over multiple time series as input, I would want to predict google trend data give the historical data for multiple timesteps in future. I am thinking of applying encoder-decoder type LSTM network architecture, but first I would like to create a simple model with one LSTM layer followed by a densely connected layer. Thanks for the help in advance! |
@bstriner How can I modify your sample code to use generator to feed data? Input image (3, mnk) => LSTM => predict three regression value
The generator
|
@bstriner Thank you for your effort.
I think this is the key point of LSTM input. But this come up a question to me: What is the proper way to pad a timestep sequences when step length is larger than shortest one ? For example : Pad Flow: As you see, a lot of blanks, I think this is not efficient. Is there any dynamic way to do this, or in other word, is there a best practice for variable length input ? |
I do not understand how to format my input matrix like you are describing @bstriner. I have 702 rows of weekly data with 126 features in my training dataset. I would like my lstm to train on data in batches of 12 weeks (quarterly) and from what I am understanding you to say I want an input size of (690, 12, 126) but that is not an allowable size to reshape my original dataset of size (702, 126) into. Please help. I cannot seem to figure this out! Thanks so much for the future replies. |
how to leverage a GRU to map the input sequence to |
@bstriner I have a question If there is Dense layer after LSTM. Will output of that Dense layer depend on all timesteps or only on last timestep? |
I'm trying to solve a problem I had intended for tensorflow in Keras.
I've gotten a lot further using Keras, but I'm still unclear on how best to represent my sequence data. The following code works quite well using only one input sample and one target sample:
But I can't get it to work with multiple samples:
Due to this error: ValueError: Input arrays should have the same number of samples as target arrays. Found 1 input samples and 9 target samples.
What am I doing wrong with array shape?
It would be really nice if Keras facilitates this use case such that a single data structure holds the sequence and the fitter would know that for each input X_t, the target is X_(t+1). This would provide some benefits such as the following:
There would be no redundancy in storing the data and targets separately.
One would not have to be concerned with the shape of the input and targets separately.
The text was updated successfully, but these errors were encountered: