A Running Speed Benchmark for Kernel & Non-Kernel AlgorithmsAre non-kernel always faster than their counterpart kernel-based algorithms?Ori CohenBlockedUnblockFollowFollowingJul 9The Kernel method, Wikipedia, CC BY-SA 4.

0, https://en.

wikipedia.

org/wiki/Kernel_methodFor many years I knew that Linear SVM is faster than the kernel version, this was common knowledge rather than something I have tested.

A while back I wondered if all kernel-based algorithms are slower than their non-kernel versions.

The following comparison serves as a running time benchmark for several well-known algorithms.

Kernel methods owe their name to the use of kernel functions, which enable them to operate in a high-dimensional, implicit feature space without ever computing the coordinates of the data in that space, but rather by simply computing the inner products between the images of all pairs of data in the feature space.

This operation is often computationally cheaper than the explicit computation of the coordinates.

This approach is called the “kernel trick” — from wikipediaFor those of you who want to run this benchmark straight away, you can find the notebook in my Github.

The dataset used for this benchmark is the breast cancer dataset, which is small but well known.

You can experiment with other datasets in order to see if the empirical results stay the same, I suspect they will.

We’ll compare the following algorithms:Linear SVM vs Linear Kernel SVMPCA vs Linear Kernel PCAK-means vs Kernel K-meansPDF vs KDE (Gaussian) vs Gaussian KDEThe comparison is two-fold, once for “fit” and another for “predict” or “transform”.

please note that I couldn't find an equivalent transform method for probability-density-function (PDF).

Support vector machine (SVM), principal-component analysis (PCA), kernel-density-estimation (KDE) and K-means are well known and have the relevant functions, for Kernel-K-means I had to look in an external package called Tslearn.

Table 1: Running time for ‘fit’Table 2: Running time for ‘predict’ or ‘transform’Results & DiscussionIt seems like there is a significant difference between the two types, for ‘fit’ (Table 1), notably linear SVM is twice as fast as its kernel counterpart, PCA is ~30 times faster, and K-means is around 40 times faster.

For ‘predict’ (Table 2) or ‘transform’, we see the same picture, but not on the same scale.

There may be many reasons for the speed differences.

If you are interested it's worth checking the implementation and the time complexity for each algorithm, which are not covered here.

Unless you really need a kernel version of a certain algorithm, you should first consider the non-kernel version.

It's worth noting that the fastest versions exist in Sklearn, which means that you don't need to look at other packages.

Dr.

Ori Cohen has a Ph.

D.

in Computer Science with focus in machine-learning.

He leads the research team in Zencity.

io, trying to positively influence citizen lives.

.