Kernel Secrets in Machine Learning Pt. 2

The mapping function for the polynomial kernel of degree 2 in 2-dimensional space looks like this:When increasing the input dimension d and the degree of the polynomial, the resulting feature space that we mapped to becomes quite big.

Good thing that we can calculate the dot product without doing the transformation, as specified in the formula above.

This is one of the many beautiful formulations in kernel theory.

The radial basis function kernelThis is a very famous and often used kernel.

Notice that because of the negative exponent in the exponential, the value of the exponential ranges from 0 to 1, which is a nice feature since we can say that 1 means very similar, or the same and something close to 0 means completely different.

The σ parameter in the exponential controls the sensitivity of the kernel.

For a low σ, we are expecting only really close points to be similar.

For a bigger σ we are relaxing the similarity criterium in that a bit more distant points are going to be more similar.

But of course, the kernel looks like this because we fixed x to 0 and varied x’, logically enough we want to calculate the similarities in the whole X domain between points.

This hints at a plane, actually, it is this plane that would be an example of what the kernel looks like:Unimpressively enough, the value of the kernel is the highest at the diagonal, where x and x’ are the same.

Periodic kernelWhen you think about periodicity, you automatically think about periodic functions like sine and cosine.

Logically, the periodic kernel has the sine function in it.

The hyperparameters of the kernel are once again the σ that specifies the sensitivity of the similarity, but additionally, we have the parameter p that specifies the period of the sine function.

This should completely make sense.

Also, notice the similarity between the radial basis kernel and the periodic kernel, both are restricted to outputting values between 0 and 1.

When would you want to use a periodic kernel?.It is quite logical, let's say that you want to model a sine-like function.

Just by taking 2 points from this function that are distant with regards to their Euclidian distance most certainly doesn’t mean that the value of the function is meaningfully different.

In order to solve this kind of problem, you need periodic kernels.

Just for the sake of completeness, take a look at what happens when we tweak the periodicity of the periodic kernel (nothing unexpected):Locally periodic kernelWe arrive at this kernel basically just by taking the radial basis kernel and multiplying it with the periodic one.

What we achieve with this is that the resulting kernel changes its value additionally with the distance between x and x’ and not only with the periodicity of the distance.

This results in so-called local periodicity.

Just because I am very intrigued, let’s plot this kernel in 3D and get the following nice funky shape:Looks quite cool!Constructing new kernelsNow that we saw a few examples of kernels.

The question arises, what do we need to construct new kernels?.There are two nice characteristics of kernels:Adding a kernel with a kernel results in a new kernel.

Multiplying kernels results in a new kernelThese allow you to basically build non-trivial kernels without doing much math per-say, and it is really intuitive.

The multiplication can be perhaps looked at as an and operation, especially when considering kernels bounded between 0 and 1.

Accordingly, we can arrive at a locally periodic kernel by combining the periodic kernel with the radial basis function kernel.

These were a few examples to get you started in your kernel adventures.

Of course, this is barely scratching the surface of all of the interesting kernels that are out there.

Kernel design tailored to a problem is a non-trivial task.

A certain level of experience is required to get good at it.

Also, there is a whole area in machine learning dedicated to learning kernel functions.

Kernel design can also be tricky because of algorithm requirements.

Since many kernel-based algorithms involve something called the Gram matrix that is inverted, we require our kernels to be positive definite, but this is something that I will be covering in the future.

Now that we know a few useful kernels, we can dig a bit more into the theory of Hilbert spaces and how they relate to kernels, but this is going to have to wait till the next article.

Till then, here is some recommended reading if you had trouble following this article:Kernel Secrets in Machine Learning Pt.

1On the Curse of DimensionalityLearning with KernelsGood job!.. More details

Leave a Reply