Regression: Kernel and Nearest Neighbor ApproachIn this article, I will talk about the Kernel and Nearest Neighbor Approach which forms a major class of non-parametric methods to solve a regression setting.
Kushal ValaBlockedUnblockFollowFollowingFeb 26Courtesy: Taken from pixabayRegression Algorithm has been widely solved by using a Least Square Algorithm, which is a parametric approach from a statistical point of view, in contrast, there are other learning algorithms which can be used to ‘predict’ continuous values.
Before delving into kernel-smoothing methods, I will talk about the Nearest Neighbor Approach to solve and its shortcomings (Yes! It's not just used for Classification solely.
)Understanding the concept of a kernel is the basis of these techniques.
Kernel is a weighing function which assigns weights to the neighboring points based on distance to the query point.
As aforementioned, we get an insight that these methods focus more on the distribution of points around our query point.
(the point where we want to estimate the output)k-Nearest Neighbor (k-NN) RegressionIntuition: An estimate is formed by averaging over the ‘k’ nearest data points which are defined by a function called Neighborhood (N)Figure 1: Estimate of Nearest Neighbor ApproachNeighborhood implies closeness to the query point.
Closeness is defined by a metric.
In this method, we use a naive approach of Euclidean kernel.
Euclidean Kernel is a fancy word for the square root of the distance between points.
More formally in statistics known as an L2-norm.
Figure 2: An equation of Euclidean KernelEuclidean kernel gives equal weights to all the neighboring points and has the shape of a rectangle.
Figure 3: k-NN Approach for Regression (k=30)As we see in figure 3, we see that weights are equally assigned for 30 neighboring points and the estimate is discontinuous and step-like output, which is not a desirable fit.
This model also under-performs on boundary points.
( Boundary Effect ).
Our model tries to find the nearest points based on Euclidean norm and averages them to get an output.
But for query points at the boundary, this method will select repetitive neighboring points which result in a similar output and thus flatten the regression line.
Nadaraya-Watson Kernel-Weighted Average RegressionIn the above method, one of the major drawbacks was the equal assignment of weights.
This method assigns weights to each point in a window of query point based on a specific Kernel.
Main intuition is that weights should decrease with increase in distance and more weights should be assigned for nearest points.
Figure 4: Equation of Weighted Average by Kernel FunctionThere are different kernel choices: Epanechnikov quadratic kernel, Tri-cube Kernel, and Gaussian Kernel.
For example in figure 3, we use the Epanechnikov Quadratic Kernel.
Figure 5: Weighted Kernel Regression with Epanechnikov KernelSo how does this Kernel assign weights?.For a start, it surely assumes a relationship between the distance of query point and the neighboring points.
To understand it, we will see the formulation of one of these kernels.
In this case, we use the Epanechnikov Kernel ( which is one of the widely used!)Figure 6: Formulation of Epanechnikov KernelWe see there that kernel weight will be zero if the point lies outside the local density.
One of the important parameters is window size ( Lambda in fig.
Window size determines the width of local neighborhood i.
e extent of the region considered by query point.
In the above example, we saw a fixed window size, but for more complex fits, we can use adaptive neighborhood size.
An advantage of using an adaptive neighborhood is to have a better estimate fit at boundary regions.
In conclusion, These non-parametric methods come very close to find the right fit and with an optimal parameter search, we can outperform traditional methods.
Although the setting of parameters in these methods is non-flexible which paves way for parametric approach of solving.
References: Trevor Hastie, Robert Tibshirani, Jerome Friedman, The Elements of Statistical Learning.
(2008) Gareth James, Daniela Witten, Trevor Hastie, Robert Tibshirani, Introduction to Statistical Learning with Applications in R (2013) Regression, Machine Learning, Coursera- University of Washington.