You can see in the last two years of real data (2017–2019), the growth tapers off.
Since the SARIMA only has an AR seasonal component of 1 and an MA component of 2, the farthest back my model is reaching is 2 years.
It continued the trend of the past 2 years into the future.
Growth Predicted (over 4 years):+3.
7%+260,000 departures/yearFinally, I used Q-Q plots, and plots of the residuals to ensure that my model did not violate any assumptions of linearity when I made my model (plots not included in this post, but they are in the notebook on my Github).
Facebook ProphetNext, I wanted to use Facebook Prophet to conduct the same analysis.
Facebook Prophet is an open-source time series tool released by the Facebook team in early 2017.
It does more computational work on the back-end in order to generate higher-quality time series analysis than more traditional methods like SARIMA.
For instance, Prophet will look for multiple levels of seasonality (hourly, daily, weekly).
It is also designed to handle outliers, non-linear seasonality, and holidays, which makes it more flexible than SARIMA.
FB Prophet also has the advantage of being easy to use.
Once your data is in the proper format, you can plug in data and see what kind of prediction you get.
Mean Absolute Error: 5.
4%We see that Prophet performed about the same as my SARIMA model in predictions — still consistently under-guessing the growth over time.
This leads me to believe that SFO traffic growth may be accelerating in a way that is unexpected in my model’s training data.
FB Prophet: Predicted much larger growth than SARIMAHowever, you can see that when Prophet is retrained on all the data, its future projection extrapolates the overall trend better than SARIMA.
Growth Predicted (over 4 years):+18%+1,300,000 departures/year3.
Neural NetsFinally, I wanted to try to build a neural net that would outperform SARIMA and FB Prophet.
Neural nets are useful because they can capture hidden patterns through hundreds of iterations that would be missed by a human designing a parametric method like SARIMA.
However, I had a problem.
Because neural nets can easily overfit, I needed to manually separate a validation set, which was not necessary for either SARIMA (since I used AIC score to determine the best model’s hyperparameters), or FB Prophet (since it runs once, and there were no hyperparameters I was tuning).
Train set: ‘05-’12Validation set: ‘13-’14Test set: ‘15-’19Now our training data is getting farther and farther from the range I’m actually going to be predicting — 2019–2023.
I am also operating with small data.
My new train set only has ~80 data points to train on (one point for each month), which is much smaller than most neural nets are effectively used for.
The possibility of an accurate neural net was looking less and less promising for my time series analysis.
I was going to try 3 different kinds of neural nets, in order to see which one would handle my problem most effectively.
3 Kinds of Neural NetsLSTM (Long-Short Term Memory): Incorporates memory gates to “forget” data that may be old — particularly useful for time series analysis.
An oldie but a goodie for time series analysis, these have been around since the late 90's.
CNN-LSTM: Adds a Convolutional Neural Net on top of an LSTM.
Though CNNs are frequently used in image recognition, in time series they can extract patterns (fine-grained types of seasonality) that can then be rolled into the LSTM.
GRU (Gated Recurrent Unit): Newer architecture that simplifies the LSTM by reducing the number of gates from 3 to 2.
Works well with small data.
When training my models, I made sure to score my models in every epoch on their performance on the validation set, in order to combat overfitting to the train set.
I tested each architecture, and the GRU far outperformed the others.
The best GRU model for my problem had 2 GRU layers, followed by 3 Dense layers of decreasing size (without any dropout layers).
This architecture made me suspicious of overfitting.
However, the graph of the test predictions below (and my eventual 2019–2023 prediction) were smooth enough to make me confident that my model was not overfitting.
After fitting my GRU, when making predictions, I gave it 12 months of data and asked it to predict the next month.
I would then take that prediction, incorporate it into a sliding window of the next 12 months of data, and make a prediction based off that.
This means that after 12 months of predictions, my GRU was predicting on entirely synthetic data.
Though the GRU’s performance looks similar to SARIMA and FB Prophet, it actually had a significantly lower error than both.
Mean Absolute Error: 4.
6%This means my neural net had ~20% less error in its prediction than SARIMA and Facebook Prophet, all while training on less data, predicting on itself, and preventing overfitting.
When my GRU was retrained on all available data and made its rolling 1-month predictions based on the previous 12 months of data, it projected the following growth out to 2023:Growth Predicted (over 4 years):+22%+1,600,000 departures/yearThe smoothness of these projections made me confident that the newly trained neural net did not overfit on the whole dataset, despite the lack of dropout layers in my neural net architecture.
ConclusionIn conclusion, since my GRU neural net had the lowest mean absolute error (outperforming Facebook Prophet and SARIMA by ~20%), we can be most confident with its predictions of the future.
My GRU predicts that SFO will have an additional 1.
6 million departures per year, with a total growth of 22% more annual departures by the year 2023.
Future WorkIn the future, I’d like to check for patterns of seasonality on the differences between arrivals and departures from SFO, for each airline.
I can then see if there is a way to predict what airline is likely to have the most open seats for a particular month.
For example, perhaps United typically has more departing passengers than arriving passengers in SFO over the holidays, indicating that their planes will be very full on the way out of SFO.
But maybe Jetblue has a more advantageous arrival/departure ratio in the holiday months, meaning that you should travel with Jetblue in the holidays if you’re looking to maximize your chances of being on an empty plane.
The above pattern could hold constant across time, or it could invert for other months.
I would then be able to give month-by-month recommendations for what airline to fly in the future to maximize the chance of being on an empty plane (in order to maximize leg room and free carry-on baggage space).
Thanks for reading!.If you want to keep up with my work, feel free to check out my Github.