It worked amazing, but with a few tweaks.
Let’s dive into those tweaks and the architecture we need to optimize for sequencial data.
WaveNet ArchitectureGated Activations and Skip ConnectionsWhat we see above is a Gated Activation.
Similar to gates in LSTM or GRUs, the tanh branch is an activation filter, or modifier of the dilated convolution that happened just below.
It’s the “squashing function” we’ve seen in CNNs before.
The sigmoid branch serves essentially as a binary gate, and is able to cancel everything up to it; it learns which data is important, going back an arbitrary number of periods into the past.
Also note the grey arrows pointing right: these are Skip Connections.
They allow a complete bypass of convolution layers, and give raw data the ability to influence the formulation of predictions — again — to an arbitrary number of periods into the future.
Don’t worry, these are hyper-parameters that you can validate on slices of your data.
Optimal values depend on the structure and complexity of the sequence you learn.
Remember, in fully-connected NNs, a neuron takes inputs from all neurons in the previous layer: early layers establish later ones via a hierarchy of intermediate computations.
This allows NNs to build complex interactions of raw inputs/signals.
But… what if raw inputs are directly useful for prediction, and we want them to directly influence the output?.In detail, skip connections allow outputs of any layer to bypass multiple future layers and skip influence dilution!.Keras allows us to store the tensor output of each convolutional block — in addition to passing it through further layers — with skips.
Note how for each block in the stack above, the output from the gated activations joins the set of skip connections.
Residual ConnectionsResidual connections are similar to skip connections: think of them as consistently-available short layer skips!.We’ll use a one-layer skip for our model, but it’s also a hyper-parameter.
Why they help is mysterious, but it’s most likely due to helping with the Vanishing or Exploding Gradient obstacle in backpropagation.
This becomes more important with larger models, but I’ll show you the implementation in a smaller setting for educational purposes.
My ResultsAs you can see in the Notebook, my results are very good compared to Facebook Prophet:Prophet Mean Absolute Error (MAE): 8.
04WaveNet MAE on validation set: ~1.
5MSFT Volume PredictionThere are some tricky trends though, and no easy answers when it comes to stock prediction:MSFT Adjusted Close PredictionThe overall trend fools WaveNet into a strong regression, as opposed to following local momentum.
Alas, no easy money in the markets!!However, I haven’t hyper-parameter tuned my model, and my training set was quite limited.
Can we do better?.Let me know what results you get!.