I am trying to forecast the value of the ambient temperature based on given data on Python. The data frequency is 15 minutes. In order to predict future values, I am using a simple autoregressive model and I have tried different orders. I have also split the data into train and test datasets, with the the train dataset being the 70% of the whole data and the test being the remaining 30%. Here is a plot of the whole data.
The problem that I encounter is that the predictions deviate quite a lot from the real values. Moreover, the prediction converges to a certain value after some steps, which makes the prediction useless. You can see an example of the prediction vs the real data here.
As you can see, the AR model seems to capture the periodic behavior of the real data, but it also has some attenuation as the peaks get smaller and smaller.
Here is the code of my implementation in Python:
from statsmodels.tsa.arima_model import AR from sklearn.metrics import mean_squared_error from matplotlib import pyplot import pandas as pd data = pd.read_excel('one_year_data_celsius.xlsx') # split dataset Nparam = 100 X = data.TAmb_meas upto = int(len(X)*0.30) train, test = X[1:len(X)-upto], X[len(X)-upto+1:] # train autoregression model model = AR(endog=train,dates=train.index,freq='15min') model_fit = model.fit(maxlag=Nparam) until = 24*4*14 predictions = model_fit.predict(start=train.index[-1], end=test.index[until-1], dynamic=False) pyplot.plot(test[0:until]) pyplot.plot(predictions, color='green') pyplot.legend(['real data','prediction'])
I would like to know: is this behavior of the AR normal, or am I making a silly mistake somewhere? Or maybe the behavior is related to the fact that I am trying to predict quite far away (1344 steps, although already after 100 steps the prediction is not good)?I have tried with different AR orders but the only "improvement" that I get is to make the "attenuation" smaller.
I apologize in advance for any beginner mistake that I may have done since I am new in this topic.
The issue with your ARIMA model is that it is making the prediction based on the values in the early stages of the time series.
From looking at the graph, there is a "zig-zag" type of pattern, where we see only one change in trend to the upside and the pattern continues as normal.
Have you tested for stationarity yet? If not, you should do so as that will give you a good indication as to whether you are using the correct ARIMA configuration.
I can't really advise any further without seeing the data up close. That said, there are two options you could try:
Use the pyramid package in Python, which replicates the auto.arima() function in R to estimate the ARIMA configuration automatically based on best fit. You can find more information here: https://www.alkaline-ml.com/pyramid/
If your data is volatile, which appears to be the case here, then it is possible that you would need to use a GARCH model to better adjust for the volatility in your data. Here is an example of how this is done: http://www.blackarbs.com/blog/time-series-analysis-in-python-linear-models-to-garch/11/1/2016