I used ARIMA and SARIMAX models from StatsModels and the popular Facebook Prophet model as well.

The first thing I looked at was autocorrelation and partial autocorrelation plots.

In time series we use previous data points (lags as we call them) to predict what will happen next.

So if we’re just looking at 1 lag, we’ll predict that today will have some weighted correlation to tomorrow (e.

g.

a regression model could say it’ll be 0.

5 greater (the beta value) + an error term).

These plots help us determine what are likely to be the most important lags.

Here are the plots for our VIX data:The first thing to notice is that there isn’t a strong correlation with any point beyond itself.

The lags jump down from a perfect correlation with itself (1.

0) to the next lag being around -0.

2.

The second thing to notice is that both graphs are similar.

Partial autocorrelations can show a difference when there’s a lasting impact from multiple lags in the autocorrelation plot on the left.

This isn’t the case for our data, it seems to not be highly autocorrelated.

Finally, for your understanding take a look at that blue shaded region in both graphs.

The points each either stick up out of it or stay inside it.

That’s a confidence interval, so when the points are inside or near the shaded region that says they likely aren’t worth incorporating into the model.

To further help your understanding let’s briefly look at another dataset.

Look at these plots of 20 years of monthly temperature data to see more clearly autocorrelated data:Monthly temperature ACF & PACF plotsYou can see the spikes around 12 and 24 on the autocorrelation graph.

Those are the same month each year, which highly correlates with the temperature from another year in the 20 years of data we have.

The partial autocorrelations chart here suggests that once you filter out the effect of each intermediary lag, the lags that really have more impact are about 1–12 or so when the values start to land close to the shaded region.

If you want a more in depth look at this, read this Duke time series guide that does a great job with deeper explanations.

For the purposes of this tutorial, the main takeaway is that each year the weather correlates with the previous year.

This doesn’t appear to be the case for the VIX!Auto-regressive modelSo for our modeling process we’ll start with the simplest model, an AR(1) model, that just uses the last datapoint in the regression model to predict the next one.

We’ll split our data in to a training, validation and test set (only using the test set at the very end once).

For time series we can’t do KFolds or other fancier cross-validation methods and we have to make sure to keep the order for our splits.

train_vix, train_prime, validation, validation_prime, test = split_data(weekly_pct)We’ll then go through the normal ML process:Import the ARIMA modelFit our model with our training data and parametersScore our model to see how well it does against validation dataThat should be like many of the other libraries you’re using.

You can see the code below:from statsmodels.

tsa.

arima_model import ARIMA #import modelmodel = ARIMA(train, order=(1,0,0)).

fit() #fit training datapreds = model.

forecast(52*2)[0] #predictRMSE(validation,preds) #scoreNote I’m predicting 104 weeks out as I set my validation set to be 2 years long rather than take 20% of the data to avoid getting too close to the crazy spike we saw in 2008.

I also chose the length of the validation set based on a more computationally intensive technique I’ll use next.

The first model didn’t do very well, predicting an RMSE of .

04217, where we saw the standard deviation of the data is 0.

028008.

So we’d do better just guessing the mean so far.

AR time window refittingThe next technique I tried was more computationally intensive.

Instead of predicting the next 104 points, I decided to fit the model, predict 1 point out, then refit it and repeat that process 104 times.

The further into the future you go, the less predictive power you have so I thought this would help.

Here’s a graphic I created to help explain this technique:Loop and fit one more data point before each prediction to improve modelAnd the code for the function below:def window_refitting_model(df,p,q): preds = [] df = list(df) for i in validation: model = ARIMA(df, order=(p,0,q)).

fit() pred = model.

forecast()[0][0] preds.

append(pred) df.

append(i) plt.

plot(validation) plt.

plot(preds) rmse = RMSE(validation,np.

array(preds)) print(rmse) return rmse,validation,predswindow_refitting_model(train,1,0)This outputs a slightly better score with a ~1% improvement in RMSE score.

ARIMASo what’s next?.So far I just used the first parameter (p) in the ARIMA model which made it an auto-regressive model.

Next I tried to optimize my p and q scores for the ARIMA StatsModel package.

The p parameter tells the model how many lags back to look (in case my autocorrelation plots weren’t correct) and the q parameter looks at moving averages of a number of data points which can improve the model by smoothing out noisy variation.

After some computation I find that an ARMA(8,2) model (p=8, q = 2) does slightly better than my AR(1) model with a .

047% improvement.

These improvements may not be worth using in production, but given the tough financial data we’re using I go ahead with them for the sake of at least finding the best model.

When I then turn and use the window refitting technique on the ARMA(8,2) I’ve now got a RMSE of .

04117 and a 2.

37% improvement.

One note before we move on, I didn’t try the middle parameter of the ARIMA model (the differencing parameter) because I had already transformed my data into a percent change.

That could be useful for other datasets to also try optimizing this parameter.

See the documentation for more information.

SARIMAXThe idea of the SARIMAX model is to do the same things as ARIMA, but also add in the ability to add a seasonal dimension to your model with a second set of (p,d,q) parameters.

This helps if there’s an amount of lags that predict your data but then there’s also some autocorrelation at a different interval (e.

g.

lags 1–5 and lags 30–50 really matter).

Finally the X letter stands for exogenous data, which can be any feature you think could improve the model.

Given the lack of any seasonal indicators in our data I tried a few parameters with the second set of (p,d,q) parameters but didn’t see any improvements.

I then went ahead and added in exogenous data (my Federal prime interest rate data) and scored the model.

import statsmodels.

api as smvalidation_prime_df = pd.

DataFrame(validation_prime)sar = sm.

tsa.

statespace.

SARIMAX(train_vix, exog=train_prime, order=(1,0,0), seasonal_order=(0,0,0,12), trend='c').

fit()pred = sar.

forecast(52*2,exog=validation_prime_df)RMSE(validation,pred)Think the Prime Rate would help?.It didn’t, it scored the same as my baseline AR(1) model with an RMSE of .

04217.

Facebook ProphetFacebook has a very popular time series library that I’d heard great things about from friends in industry.

While ARIMA and SARIMAX require lots of hyperparameter tuning, the Prophet model does all of that mostly under the hood.

You do have to massage your data to fit into their formats but it isn’t too much work and you can see all of my code in the Facebook Prophet Jupyter Notebook in the repo.

One nice thing about their library is the visualizations it provides.

While my model ended up doing worse with Facebook prophet (-0.

95%), it did provide some interpretability with these graphs.

Here is a nice graphic of FB’s predictions for my validation datasetFacebook prophet’s yearly trend graphic, suggesting trends of my validation dataset.

SummaryTo finish my project I created a summary table of the major models and results I got as well as a graph of my best model’s predictions vs.

the actual validation dataset.

Then I scored my best model on my test set (only use this set at the end once) and would use that as an indicator of the predicted error in my model.

My conclusion: I certainly wouldn’t use this model to trade the VIX!Finally, I did learn a few things about financial data that are worth discussion:Time series analysis is challenging on highly analyzed market indices like the VIX.

I picked the Federal prime rates based on theories about how they predict market volatility, but for this type of analysis it wasn’t a fast enough economic indicator and changed infrequently compared to the fix so had no impact on my model.

Hopefully you’ve been able to follow along in the notebooks and have learned a lot about time series analysis.

If you have any questions, don’t hesitate to reach out to me here, on GitHub or on Twitter.

.. More details