SVMs find a line (or hyperplane in dimensions greater than 2) in between different classes of data such that the distance on either side of that line or hyperplane to the next-closest data points is maximized.In other words, support vector machines calculate a maximum-margin boundary that leads to a homogeneous partition of all data points..This classifies an SVM as a maximum margin classifier.On the edge of either side of a margin lies sample data labeled as support vectors, with at least 1 support vector for each class of data..This converts equations 4 and 5 into one equation.Equation 6 should hold for any samples that are classified as support vectors.Since our initial goal was to establish a margin that is as wide as possible, we must determine a way to express the distance between the boundaries of the margin.Let’s draw two vectors from the origin to support vectors in our negative and positive classes, labeling them as x- and x+, respectively..Now we have enough vectors to calculate width by taking the dot product of our distance vector and our perpendicular vector w, then dividing by the magnitude of w.From equation 6, and knowing that y ≥ 1 for positive samples and y ≤ -1, we can do some algebraic reduction and rewrite equation 7 to equation 8.To maximize equation 8, a function with constraints, we must use LaGrange multipliers..This will provide us a new function to maximize without needing to consider the constraints.First, we differentiate L with respect to w and find that the vector w is a linear linear sum of all or some of the samples.Differentiating L with respect to b gives:Plugging our value for w in equation 10 into equation 9, we end up with equation 12.Further reduction gives usIf the result for equation 14 is ≥ 0, our sample is in the + class.So far we’ve examined cases in which our classes of sample points are linearly separable.In the case that our sample points are not linearly separable, we must do a transformation into a new space.. More details