This result is already very close to the state-of-the-art accuracy on this dataset.
Creating the ModelTo be able to test different models, we need the capability of creating models on the fly.
Meanwhile, we also need to test the model and provide results.
Both needs point me to object-oriented programming.
I then create the following class for testing.
I’ll explain the technical details of this and the following section in a separate post.
Automating the TestsSince we need to test many different combinations of parameters and need to save the results, it’s important to automate the test process.
Again, let me show and not tell since details will be explained in a later post:Baseline Neural Network ModelLet’s start with a baseline model with the following default parameters:input_dim=8num_layers=2num_units=8activation=’relu’activation_out=’sigmoid’loss=’binary_crossentropy’initializer=’random_uniform’optimizer=’adam’learning_rate=0.
001metrics=[‘accuracy’]epochs=10batch_size=4one_hot=FalseIf we run:param_dict_defaults, param_dict = get_defaults(), get_defaults()accuracy_baseline = run_test(X=X, y=y, param_dict=param_dict_defaults)We’ll get:Finished cross-valiation.
Mean Accuracy: 71.
61%, Standard Deviation: 2.
92%It’s not bad, but definitely far from the top result of 77.
Importance of Different ParametersTo understand different parameters’ impacts on model tuning, let’s adjust one parameter at a time while keeping other parameters constant (thus different from an exhaustive search such as GridSearchCV in sklearn).
Running the tests will provide us with the following results:First, it’s interesting to note that some parameters not mentioned in the above parameter tuning guideline, can be important factors, e.
optimizer and epochs.
Second, learning rate is indeed among the most impactful parameters.
Third, for this specific experiment (including parameter choices), it seems that number of layers is more important than number of hidden units.
This is contrary to the above guideline.
Below is the tuning trend which can be used to find the ranges to tune in.
It’s important to note that the test here is only meant to provide some intuition and shouldn't be taken as formal rules.
This is due to at least two reasons — one, the various parameters and their candidate values are not necessarily comparable; two, there’s innate randomness in neural networks, as such, results such as the above plots could change.
Although it’s highly likely that the interaction between parameter values does matter, i.
40 epochs may yield a worse accuracy when paired with a learning rate other than 0.
1), we’ll nevertheless try out a naive approach here — combine the independently tuned best parameter values and train a model, which gives us:Finished cross-valiation.
Mean Accuracy: 78.
00%, Standard Deviation: 4.
59%Wow, that’s a brutal 50 minutes!.Although we cannot complain about the result since it’s state of the art!.It seems the naive approach does work.
Parameter TuningNow that we see the relative importance of the parameters, it’s time to tune the model.
As learning rate is the most important one, let’s tackle it first.
We’ll use the following code to generate 6 random learning rate values between 0.
0001 and 0.
01 since this is the most promising area based on the above tuning trend visualization.
bases = np.
repeat(10, 3)exponents_1 = -(np.
rand(3) + 3) exponents_2 = -(np.
rand(3) + 2) learning_rate = np.
tolist() + np.
tolist()After running the test, we got:which points us to 0.
0006716184352348816 as the best learning rate.
Let’s use this and continue tuning batch size also with 6 options, since, we definitely want to trust Prof.
Ng’s guideline that batch size is a second most important parameter :)batch_size = [2 ** e for e in range(6)]Although batch size 2 has a higher accuracy result, the time cost significantly outweighs the benefit, so we’ll go with batch size of 16.
After updating the batch size value in our parameters dictionary, we can now proceed to tune number of epochs.
Since the time taken to train and test increases with the number of epochs, it’s better to tune this parameter at a later stage to avoid long running time.
which gives us the best number of epochs as 200.
Next, let’s build the final model, with standardized features:run_test(X=X_std, y=y, param_dict=param_dict)That gives us:Finished cross-valiation.
Mean Accuracy: 78.
53%, Standard Deviation: 3.
64%Absolutely great result!.The time taken is not too bad, although it’s 1422 times more than XGBoost ????Now, what if we don’t tune the parameters and just standardize the features?Finished cross-valiation.
Mean Accuracy: 76.
95%, Standard Deviation: 2.
88%So it seems that parameter tuning’s effect is a bit marginal, but standardization, i.
to make features have zero mean and unit variance is huge for neural networking model tuning.
SummaryLearning rate is the most important parameter to tune since it can yield big performance improvements while not negatively affecting training time.
Smaller batch sizes may provide better results, but it’s also more time-consuming!.Similarly, training for more epochs generally help improve accuracy, but the time cost is also high.
Optimizer can be an important parameter to tune.
Deeper and wider neural networks may not always be helpful.
Feature standardization can greatly improve model performance and is an easy win compared with parameter tuning.
Neural networks are great, but they are not for everything.
As we showed above, the time to train and tune a neural network model can take thousands if not millions of times more than non-neural networks!.Neural networks are best fit for use cases such as computer vision and natural language processing.
You can find the complete code in my project repo on GitHub.
Do give it a try and see what results you can get!Thank you for reading!.Is there anything that I can improve on?.Kindly let me know below.
We all get better by learning from each other!.. More details