That’s optimizer.

We are using stochastic gradient descent here.

And specifically ‘adam’.

There are several others.

You’ll learn as you dive deeper.

Loss: It is basically the cost function, you can call whichever you like.

Our model will not generate the right output in the first attempt, it will make errors, consider those errors (loss) and then learn from it.

We are using Mean squared errors function, check out others as well.

Metrics: To find the accuracy of the trained model.

Step 4 — Training our ModelHere comes the best part, seeing training our network in real time.

After providing training data there are two other parameters.

First is batch size, I’ve set it to 10 that means after evaluating a batch of 10 (ten iterations) it will update weights, it’s called batch learning.

However; another approach is to update at every iteration, it’s called reinforcement learning.

And finally there’s epoch — I am sure you’ve heard/seen this word many times — it’s just a fancy word for iterations.

Like, yeah, 100 iterations.

Repeating the same procedure 100 times.

That’s ColabAs you can see the model we built seem to have 98.

03% accuracy in training data.

So let’s predict our output now and then compare it with the actual output to see how fine our model works on new data.

Here comes the challenge Mr.

Classifier, get ready.

Step 5 — Predict OutputPredicting is easy after all that we have done.

Using the predict method on testing data and we are done.

Moreover, as we just need yes or no as an output, we will classify that.

A general rule of thumb is it is more than 0.

5 than positive and if less then that, negative.

We have our output, let’s cross-validate.

Step 6 — Cross ValidateCross-validation is like walking in the park, thanks to the confusion matrix.

It is nothing but a 2×2 matrix indicating all correct and incorrect values.

I’ll show you how.

Look at the image below.

SourceHere, TN is True Negative, that means the actual output was no and our model also predicted no.

TP is True positive, I think I don’t have to explain that now.

These both are our correct answers and the other two are false.

To find the accuracy, we will add the number to correct observations and divide it by the total number of observations.

You’ll get a better idea in a minute.

Confusion MatrixHere, in our test data, we have 140 entries.

Out of which our confusion matrix says we have 82+54 = 136 correct observations and 3+1 = 4 incorrect.

Therefore the accuracy of our model on new training data would be 136/140 = 0.

9714 which is 97.

14% accuracy.

Not at all bad for the first time.

EndnotesI was theoretically aware of what neural networks are and how they work but was not able to find the right resources to get started with practically building one, and I thought let’s assemble pieces and create a decent blog post so that it’ll be easier for people just like me.

Moreover, there are several ways we can improve our model.

If you want me to work on an article for that too, let me know in the comments.

Here is the Github repository for data files and code.

Also, my Twitter and LinkedIn DMs are always open.

If you have any feedback, let me know.

Even if you don’t, you can digitally come by and say hello.

Always love to connect with new people.

Find further reading below.

Happy Learning.

____Reference¹Global cancer statistics for the most common cancers (link)²Why, How and When to Scale your Features (link)³What is Keras?.Thoroughly explained (link)⁴Activation functions in Neural Networks(link)____Further ReadingAnalysis of the Wisconsin Breast Cancer Dataset (link)ML fundamentals (Cost functions and gradient descent) (link)Gentle Introduction to the Adam Optimization Algorithm (link).. More details