Machine Learning Algorithms from the Easy Side (Part 2)Didier ItembeBlockedUnblockFollowFollowingMar 13In the first part, we explained what is ML and some machine learning algorithms.

In this new part we will go deeper and see further methods that can help us to teach the machine to effectuate a task.

SVM (Support Vector Machine)Now we see another method to think about more.

In the example we have three red and three gray dots separated by more than one line.

Now we will investigate which of the lines best fits the data.

We can see that the green line is very close to the points, while the pink line is not as close to the points.

The pink line seems to be just enough away from the dots and so able to separate them well.

The pink line wins here over the green line.

As from the log regression, we have to explain how to find the line that best fits the points.

We should work with the distances from the dots to the lines.

Here we can look at the distance of each point to the lines and we find that the minimum under these distances is the distance from how far the respective dot lies from the lines.

By only looking at the minimum of the six distances, we can ignore the points that that are very far away from the lines.

From this, we can conclude that the pink line separates the data better, since the minimum for the pink line is greater than the minimum of the green line.

The goal here is to maximize the distances by using the gradient descent as in the previous methods.

The algorithm is called support vector machine (SVM).

The support vectors here are the dots that are closed to the hyperplane.

The pink line is the hyperplane that segregates the dots.

SVM can be used for classification tasks.

Neural NetworksHere we will continue with our tumor data.

The data are now arranged like in the next image.

It is a new model.

Sometimes the data can be arranged like this:Unfortunately in this case, we are unable to use a line to separate the data.

We can either use more than one line or a circle to separate the data.

By using gradient descent, we can minimize the error function and find these lines.

This method is called Neural Network.

The name comes from the inspiration on how the human brain works, especially when multitasking.

E.

g.

a human can go on the street while manipulating his cell phone (what could be dangerous).

Now let us say we have a computer with less power and is unable to do more than one task at the same time.

If we want to know if a new data belongs to the tumor type malignant for example, we have to separate the big task in many small tasks.

The first task or question would be: is the new data over the purple line?The answer is yes.

The next question would be: Is this new data over the green line?The answer would also be yes.

With the two answers being yes, we can conclude that the new data is a malignant tumor.

Thus we can complete the other regions with yes or no.

So The area at the bottom right would have the answers 1-No/2-Yes.

In the top left area, we would have 1-Yes/2-No and finally in the bottom left area, we would have 1-No/2-No.

Now we can represent the tasks in a graph with nodes like:For this small graph with green nodes, we pose the question, if the data with coordinates (recurrence grad=70%, growing speed grad=20%) is over the green line or not, and the answer is no.

The same process is done with another graph, where the other question, if the data with coordinates (recurrence grad=70%, growing speed grad=20%) is over the purple line or not, and the answer is no.

And for the next question, we just combine the output of the two graphs above to a new node.

This combination of two values is done using the AND logic.

Let us take a look at this AND operator.

It takes two inputs, Yes and No (or as numbers 0 and 1) and has one output.

If we enter Yes and No (or 1 and 0), the output would be No (or 0).

If No and No (or 0 and 0) are the inputs, the output would be No (or 0).

The combination of the new node with the two small graphs is called neural network.

In the neual network, we first have the input layer where we enter the recurrence grad=70% and growing speed grad=20%.

Then the information about the grad in the input layer is forwarded to the middle layer.

From the node in the middle layer comes the answers no and yes.

These are then forwarded to the output layer and will be evaluated by the and logic and the output of the and logic or of the neural network is a no.

Additional layers and nodes can be added to this network to solve more complex tasks.

This is a powerful and great machine learning algorithm.

It is used in many projects like driver assistance systems, cursive handwriting recognition, detection of bombs in suitcases using TNA (Thermal Neutron Analysis) and maybe in the future mind reading and many more.

Kernel MethodIn this new example, we will see a new method, that can transform linearly not separable data into linearly separable data.

We have points arranged as follows.

In these cases, it is not possible to use a line to separate the points.

Here we have to proceed differently.

We can imagine that the points are shown in a grid and then we separate them using a curve.

Likewise, we can imagine that the points are in space, and use a plan to separate them.

For this we add an additional axis, the z-axis.

Then the two red dots are moved across the z-axis by matching the coordinates (x,y) of the dots in 2D to coordinates (x,y,z) in 3D where z can be an equation depending on x or y like x³y , x+y² ,… Subsequently, we will be able to separate the points using a plan.

The two tricks are the same.

This approach is mainly used in support vector machine and is called kernel trick.

We will continue here with the dots arranged like in (a) and the curve as separator.

To separate the points, we will work with some equations that could help us separate the points.

We have xy, x+y, x², x³.

The coordinates of the points are applied in the equations.

The outputs will give us more information on how well the equations or functions can separate the points.

We use a table to show the results.

The first line corresponds to the coordinates of the points (from left to right) and the first column contains the equations.

We can see all the results in the table.

E.

g.

for (4,0) in x³, that makes 5x5x5=125, for (1,3) in x+y, we have 1+3=4.

Now the question is, which equation separates the points.

First, we have x+y.

We can see that the coordinates of two blue points and two red points have the same results, 4.

Thus, the equation cannot separate the points.

For x² and x³, it can be seen that there are different results for the blue and red points and the red points values are between the blue points values.

Thus, these equations cannot separate the points either.

Finally, we have xy, and notice that the blue points have the same value 0 and the red points have the same value 3.

The function can separate the points because it gives us a single value for the red points and another single value for the blue points.

We have xy=0 for the blue points and xy=3 for the red points.

We know that 1 and 2 are between and 0 and 3, so 1 and 2 separate 0 and 3.

This gives us the equations xy=1 and xy=2 and thus the functions y=1/x and y=2/x.

From this we conclude that the function 1/x (or 2/x) separates the points in the plan.

With the method, the non-linear data can be mapped to a higher-dimensional (n-dim) space.

In this new space, the data can be easily separated using a plane or a curve.

This method can be used for handwriting recognition, 3D reconstruction, geostatistics and many more.

ConclusionIn this article, divided in two parts, we have seen several important algorithms used in machine learning.

The goal was to explain the algorithms to them in a simple way using various examples.

Now we arrive at the end of our “easy dive”.

So I wish you a lot of fun in the beautiful world of machine learning.

Some good references[1] Ian Goodfellow, Yoshua Bengio, and Aaron Courville.

Deep learning.

ISBN 9780262035613.

URL https://mitpress.

mit.

edu/books/deep-learning.

[2] Christoph H.

Lampert.

Kernel Methods in Computer Vision.

Foundations and Trends in Computer Graphics and Vision.

URL http://www.

nowpublishers.

com/article/Details/CGV-027.

.