The prediction is correct!Classifying ImagesBuilding a Neural NetworkOur Neural Network will have only 1 hidden layer.
We will implement a somewhat more sophisticated version of our training algorithm shown above along with some handy methods.
Initializing the weightsWe’ll sample a uniform distribution with values between -1 and 1 for our initial weights.
Here is the implementation:TrainingLet’s have a look at the training method:For each epoch, we apply the backprop algorithm, evaluate the error and the gradient with respect to the weights.
We then use the learning rate and gradients to update the weights.
Doing a backprop step is a bit more complicated than our XOR example.
We do an additional step before returning the gradients — apply L1 and L2 Regularization.
Regularization is used to guide our training towards simpler methods by penalizing large values for our parameters W.
Our forward and backward steps are very similar to the one in our previous example, how about the error?Measuring the errorWe’re going to use Cross-Entropy loss (known as log loss) function to evaluate the error.
This function measures the performance of a classification model whose output is a probability.
It penalizes (harshly) predictions that are wrong and confident.
Here is the definition:where C is the number of classes, y is a binary indicator if class label is the correct classification for the observation and p is the predicted probability that o is of class cThe implementation in Python looks like this:Now that we have our loss function, we can finally define the error for our model:After computing the Cross-Entropy loss, we add the regularization terms and calculate the mean error.
Here is the implementation for L1 and L2 regularizations:Making predictionsNow that our model can learn from data, it is time to make predictions on data it hasn’t seen before.
We’re going to implement two methods for prediction — predict and predict_proba:Recall that predictions in NN (generally) includes applying a forward step on the data.
But the result of it is a vector of values representing how strong the belief for each class is for the data.
We’ll use Maximum likelihood estimation (MLE) to obtain our final predictions:MLE works by picking the highest value and return it as a predicted class for the input.
The method predict_proba returns a probability distribution over all classes, representing how likely each class is to be correct.
Note that we obtain it by applying the softmax function to the result of the forward step.
EvaluationTime to put our NN model to the test.
Here’s how we can train it:The training might take some time, so please be patient.
Let’s get the predictions:First, let’s have a look at the training error:Something looks fishy here, seems like our model can’t continue to reduce the error 150 epochs or so.
Let’s have a look at a single prediction:That one seems correct!.Let’s have a look at few more:Not too good.
How about the training & testing accuracy:Well, those don’t look that good.
While a random classifier will return ~10% accuracy, ~50% accuracy on the test dataset will not make a practical classifier either.
Improving the accuracyThat “jagged” line on the training error chart shows the inability of our model to converge.
Recall that we use the Backpropagation algorithm to train our model.
Training Neural Nets converge much faster when data is normalized.
We’ll use scikit-learn`s scale to normalize our data.
The documentation states that:Center to the mean and component wise scale to unit variance.
Here is the new training method:Let’s have a look at the error:The error seems a lot more stable and settles at a lower point — ~200 vs ~400.
Let’s have a look at some predictions:Those look much better, too!.Finally, the accuracy:~87% (vs ~50%) on the training set is a vast improvement over the unscaled method.
Finally, your hard work paid off!ConclusionWhat a ride!.I hope you got a blast working on your first Neural Network from scratch, too!You learned how to process image data, transform it, and use it to train your Neural Network.
We used some handy tricks (scaling) to vastly improve the performance of the classifier.
comOriginally published at https://www.
Like what you read?.Do you want to learn even more about Machine Learning?.Level up your ML understanding:Hands-On Machine Learning from Scratch“What I cannot create, I do not understand” — Richard Feynman This book will guide you on your journey to deeper…leanpub.