Better Understanding Negative Log Loss

But I was seeing the opposite effect.

My next attempt at understanding the observed behavior was to use a sufficiently high and low value, but not something as drastic as epsilon.

I chose 0.

99999 & 0.


Using these, my loss dropped to 0.


An improvement but still a lot higher than my unaltered predictions.

The AweIt was clear that my modification was worsening the error rather than improving it, but I didn’t understand why yet.

Mathematically, it should have improved.

I felt, the only way to get more clarity was to check what was really happening to my predictions.

Since I was using a Kaggle dataset, I didn’t have the labels for the test set.

I painstakingly labeled the first 500 images manually.

And then calculated the loss based on my predictions.

The loss was 0.

00960, an excellent score.

I then calculated the loss with my modified predictions, it was 0.


A significant increase, but this time, I had the data to identify the root of the problem.

On closer inspection of my prediction and the actual labels, I noticed that my model had made an error in prediction.

It had labelled 1 dog as a cat and with a prediction of 0.

03824, it was pretty confident that that image was indeed a cat.

My boosting logic had taken this value and pushed it closer to 0.

Therein was the source of the problem.

Log Loss error penalizes incorrect predictions heavilyMy 1 incorrect prediction was already costing me in my loss, but my alteration of the prediction exasperated the error causing it to increase by 0.


A good explanation of this is in this blog, excerpts of which I am mentioning below.

Let’s say, the actual value is 1.

If your model was unsure & predicted 0.

5, the loss would be;Loss =-(1 * log(0.

5)) = 0.

69314If your model was correctly confident & predicted 0.

9, the loss would be;Loss =-(1 * log(0.

9)) = 0.

10536The loss drops when the prediction is closer to the actual valueIf your model was incorrect, but also confident & predicted 0.

1, the loss would be;Loss = -(1 * log(0.

1)) =2.

30258The loss gets much worseWhen dealing with Log Loss function, it is better to be doubtful of your prediction rather than confidently wrong.

This was my oversight about my model.

I was assuming that when my model predicted confidently, it was correct, always.

If it was getting some of the images wrong, it must be predicting in the neighborhood of 0.


I completely overlooked the case where my model was confidently predicting incorrectly.


. More details

Leave a Reply