That would mean constraining the space in which we can search for our optima, making this an example of constrained optimization.
In a way, un-constrained optimization is just a special case of constrained optimization.
Equality constraintsFor unconstrained minimization/maximization problems, we simply look for a point where the gradient is the zero vector.
If the gradient is not zero, we just take a small step in the direction opposite to where it is pointing (if we’re minimizing; along it if we’re maximizing) and keep doing this until we do get to a point where it is zero and hence, there is no where else to go (this optimization method is called gradient descent).
Note that we don’t have to move exactly along the gradient.
As long as we move in a direction that has a positive projection (shadow) along the gradient, we end up increasing the objective function (and decreasing if we have a positive projection along the negative gradient).
The figure below illustrates this.
The green arrow is the gradient of the blue plane and the red arrow is the negative gradient.
Since the light-blue arrows lie on the plane, the equation of the plane will yield 0 if we take a step along them.
The yellow arrows have a positive shadow (projection) along the green one.
So, moving along them will result in points that yield positive numbers when plugged into the equation of the plane (“increase” it).
Similarly, the pink arrows have a positive shadow along the red one (reverse gradient).
So, moving along them will result in points that yield negative numbers when plugged into the equation of the plane (“decrease” it).
Fig 2: Vectors on opposite sides of a plane.
See text for explanation.
Made using https://github.
com/ryu577/pyrayFor un-constrained minimization, we looked for a point where the gradient was zero.
This was because if it wasn’t, we could decrease the objective function by going opposite to the gradient.
The same idea can be extended to the case when we have equality constraints.
Like before, we need to find a point where we can’t find any possible direction to move where the objective function decreases.
For unconstrained optimization, this simply meant that no such direction exists.
When we have a constraint, there is another possibility.
What if a direction that decreases the objective function exists, but the constraints forbid us from taking any step along it.
Let’s say you want to maximize the amount of money in your bank account.
One way to immediately boost your earnings is to sell a kidney.
But you probably have a constraint saying you’re not going to lose a vital organ.
So, even though an easy path to increasing your earnings exists, your constraint prevents you from accessing it.
This means that the presence of the equality constraint actually reduces the strictness of the condition on the gradient.
While it needed to be zero for a local optima without the equality constraint, it’s now okay for it to be non-zero as long as moving in any direction that has a positive shadow along it causes us to violate the constraint.
This can only happen when the plane of the constraint is perpendicular to the gradient (like the plane and green arrow in figure 2).
Let’s go back to our objective function f(x,y)=x²+y².
Let’s add an equality constraint, y=1.
This is a plane.
In figure 3 below, the objective function is in pink and the plane is blue.
Since we’re constrained to stay on the plane, we can’t move in any direction that has a positive or negative shadow along the gradient of the plane (blue arrows in the figure below) since that would increase or decrease the equation of the constraint, while we want to keep it constant.
The plane intersects the equation of our objective function (pink paraboloid) in a curve which is a parabola.
The pink arrows in the figure below are the gradients of the objective function at various points along this parabola.
If the pink arrow has a projection along the blue plane, we can just move in the direction opposite to the vector corresponding to that projection.
This will keep us on the plane, ensuring we don’t violate the constraint while still reducing the objective function.
However, at the green point in figure 3 below, the pink arrow (gradient of objective function) has no projection whatsoever along the blue plane.
In other words, the pink arrow is parallel to the blue arrow (which is the gradient of the constraint plane).
Fig 3: Constraint gradient aligns with objective function gradient at optima.
Made using https://github.
com/ryu577/pyrayTo decrease the objective function, we need to move in a direction that has a shadow along the negative gradient.
But as soon as we do that, we’ll end up leaving the plane of the constraint.
So, the constraint makes it impossible to decrease the objective function further at the green point.
This means it must be a local minima.
An easy way to check this condition is to require that the pink gradient of the objective function is parallel to the blue gradient of the constraint plane.
And if two vectors are parallel, we can write one as a multiple of the other.
Let’s call this multiple λ.
If the gradient of the objective function is ∇f and that of the constraint is ∇c, the condition above is:∇f = λ ∇cThe λ above is called the Lagrange multiplier.
So, we now have a concrete condition to check for when looking for the local optima of a constrained optimization problem.
Inequality constraintsInequality constraints mean you have to stay on one side of a boundary defining a constraint function as opposed to on it (which was the case for equality constraints).
For example, staying inside the boundary of a fence.
If we know how to deal with inequality constraints, we can solve any constrained optimization problem.
This is because equality constraints can be converted to inequality constraints.
Let’s say we require: c(x) = 0.
Another way to express this is: c(x)≥0 and c(x)≤ 0.
So, each equality constraint can always be replaced with two inequality constraints.
Just as constrained optimization with equality constraints can be handled with Lagrange multipliers as described in the previous section, so can constrained optimization with inequality constraints.
What sets the inequality constraint conditions apart from equality constraints is that the Lagrange multipliers for inequality constraints must be positive.
To see why, again consider taking a small step in a direction that has a positive component along the gradient.
If we can take a step along this direction (if we are maximizing; opposite to it if we are minimizing); we can’t be at a maxima/minima.
For inequality constraints, this translates to the Lagrange multiplier being positive.
To see why, let’s go back to the constrained optimization problem we considered earlier (figure 3).
Minimize: f(x,y) = x²+y²Subject to: c(x,y)=y-1=0Now, let’s change the equality constraint to inequality.
This can be done in two ways with completely different results.
We can either require:c(x,y) = y-1 ≥0.
In this case, the constraint allows for anything in front of the blue plane in figure 3.
It is easy to see that the green point in figure 3 continues to be a local optima.
Also, since the blue arrow representing the constraints gradient and the pink arrow representing the objective function’s gradient point in the same direction, we have:∇f = λ ∇cwith λ>0.
The other possibility is, c(x,y) = y-1≤0.
Now, the feasible region becomes everything behind the blue plane.
The constraint gradients will flip.
So, figure 3 will end up looking like this:Fig 4: Flipping the sign of inequality constraint from figure 3.
Note that now,The green point is no longer the local optima since we’re free to move to (0,0); which is the yellow point in figure 4 above.
At the green point, we still have ∇f=λ ∇c.
since the blue vector points opposite to the pink vector, we have λ<0.
So, it is clear that for an inequality constraint, the condition ∇f=λ ∇c indicates we’re at a local optima only when λ>0.
Putting all of this together, for a general optimization problem:Minimize f(x)Subject to:c_i(x) =0 for i ∈ Equalityc_i(x)≥0 for i ∈ InequalityWe get the full suite of conditions required to be at a local optima:Lagrange multiplier condition:∇f =∑_i λ_i ∇c_i(x) +∑_j λ_j ∇c_j(x); Eq(1)Where i ∈ Equality and j ∈ Inequality constraints.
c_i(x)=0 for all i; Eq(2)c_j(x)≥0 for all j; Eq(3)λ_j ≥ 0; Eq(4)Also note that for the two inequality constraint problems we considered, when we had y-1≥0, the green point in figure 3 was the solution.
At this point, we were on the constraint plane (y-1=0).
So, we actually had c(x)=0 and λ>0.
When we considered y-1≤0 on the other hand, the yellow point (x=0,y=0) in figure 4 became the local minima.
This point was also the solution to the un-constrained problem.
So, we simply had ∇f=0 here.
Since the Lagrange condition requires ∇f = λ ∇c, we get λ ∇c = 0.
Now, ∇c ≠0 at this point, which means we must have had: λ=0.
This means that if the constraint is active (c(x)=0), we should have λ≥0 while if it is not (c(x)≠ 0) we should have λ=0.
So, one of them should be zero in all cases.
This leads to the final condition (the complimentarity condition):λ_j c_j(x) = 0 for all j ∈ Inequality; Eq(5)Equations (1) through (5) are called the KKT conditions.
Note that we haven’t really provided rigorous proofs for them, just constructing them based on the simple example.
For a proof, the reader should refer to chapter 13 of the book by Nocedal and Wright.
A lot of people when they see these five equations feel like the problem has become even more complicated.
How do these equations actually help us solve constrained optimization problems.
The best way to get a feel for this is to go through some concrete examples.
In the next section, we take a sample problem to which we know the answer in advance and see how the KKT conditions help us correctly identify all local optima.
Example with codeSpecial cases of the generalized optimization problem involve a linear objective function and linear constraints.
This is called a linearly constrained linear program (LCLP).
The objective function and constraints can also be quadratic and an optimization problem like this is called a quadratically constrained quadratic program (QCQP).
There are software packages that are capable of solving these kinds of optimization problems for an obscenely large number of constraints (in the millions).
For simpler problems with a more manageable number of constraints however, we can leverage algorithms that can solve most (more general) polynomially constrained polynomial programs.
Which means that the objective function and constraints can be arbitrary polynomial functions.
This is because there is a general framework for solving systems of polynomial equations called “Buchberger’s algorithm” and the KKT conditions described above can be reduced to a system of polynomial equations.
I wrote a detailed blog on Buchberger’s algorithm for solving systems of polynomial equations here.
There is a python library called “sympy” that uses algorithms like this behind the scenes and solves general systems of polynomial equations.
So without further ado, let’s frame our first constrained optimization problem.
Equality constraintsMinimize: x³+y³Subject to: x²+y²=1Notice that the constraint (x²+y²=1) implies we’re on the boundary of a circle with unit radius.
So, we can say: x=cos(t), y=sin(t).
The objective function then becomes: sin³(t)+cos³(t).
If we plot this with t, we get the following graph:Fig 5: Objective function sin³(t)+cos³(t) plotted with t at the boundary of the constraint.
We can see that t=0, π/2 and 5π/4 correspond to local maxima while t=π/4, π and 3π/2 correpond to local maxima.
Now that we know the answer in advance, let’s see if the KKT conditions described above can find these as well.
Equation (1) gives (taking derivatives of objective function and constraint):[3x², 3y²] = λ[2x, 2y]Equating the two components of the vectors on the two sides leads to the two equations:3x²-2λx=03y²-2λy=0Equation (2) simply requires that the equality constraint be satisfied:x²+y²=1And since there are no inequality constraints, we don’t need equations (3) to (6).
Now, we can enter the three equations above into the symbolic equation solver the python library sympy provides.
This leads to the following result (all possible solutions to the system above with values of x, y and λ in that order):[(-1, 0, -3/2), (0, -1, -3/2), (0, 1, 3/2), (1, 0, 3/2), (-sqrt(2)/2, -sqrt(2)/2, -3*sqrt(2)/4), (sqrt(2)/2, sqrt(2)/2, 3*sqrt(2)/4)](-1,0) corresponds to t=π; (0,-1) corresponds to t=3π/2; (-sqrt(2)/2,-sqrt(2)/2) corresponds to t=5π/4 and (sqrt(2)/2,sqrt(2)/2) corresponds to t=π/4.
So, we can see that all the local maxima and local minima we identified above have been identified by the KKT conditions.
Now, we can simply find the maximum and minimum values of the objective function at these candidate points.
Inequality constraintsNow, let’s change our equality constraint from the problem above to an inequality constraint and see how that changes our solution.
Minimize: x³+y³Subject to: x²+y²≤1Where the constraint implied we could only be on the boundary of the unit circle in the previous case, we can now be anywhere inside it.
The full heat-map of the objective function within the constraint disk is plotted below (looks like a planet with a star somewhere around the top-right corner).
The red arrows are the gradients of the boundary of the constraint while the black ones are the gradients of the objective function.
Fig 6: x³+y³ plotted within the disk x²+y²≤1While the equality constrained problem was a one dimensional problem, this inequality constrained optimization problem is two dimensional.
While there are only two ways to approach a point in one dimension (from left or right); there are an infinite number of ways to approach it in two dimensions.
This means we need to beware of saddle points.
These are points which qualify to be optima, but are not really optima since they are maxima when you approach them from one direction but minima when you approach them from another.
The figure below shows what a saddle point looks like.
Fig 7: A saddle point.
Maxima when you approach it from one direction, but minima when you approach it from another.
So, we need to re-evaluate all the points that were local minima or maxima in the case of the equality constraint and make sure none of them become saddle points.
Figure 5 tells us that t=0 (x=1,y=0) is a local maxima when approached along the boundary.
And when we approach the point from inside the disk as well (say along the line joining x=0,y=0 to this point), the value of the objective function increases as we approach it.
So, t=0 is a local maxima no matter where we approach it from.
Using similar arguments (or noting the symmetry in x and y), so is t=π/2.
Similarly we can argue that t=π and t=3π/2 are local minima no matter where we approach them from inside the feasible region.
Looking at t=π/4 however, we note from figure 5 that approaching it along the boundary makes it a local minima.
However, approaching it from inside the disk (along the line joining the origin to this point for example) makes it a local maxima.
So, it is overall neither a local maxima nor a local minima.
Such a point is called a saddle point.
Using similar arguments, t=5π/4 is also a saddle point.
Now, let’s see what the KKT conditions say for our problem.
Plugging the objective function and constraints into the KKT equations (1) through (5) we get:Equations 6 (a) through (e).
To leverage polynomial equation solvers, we need to convert these to a system of polynomial equations.
The first two conditions (6-(a) and (b))are already equations.
The third one, x²+y²≤1 (6-(c)) is an inequality.
But we can convert it into an equality by introducing a slack variable, k; x²+y²+k²=1.
The last equation, λ≥0 is similarly an inequality, but we can do away with it if we simply replace λ with λ².
Now, we demonstrate how to enter these into the symbolic equation solving library python provides.
Code solving the KKT conditions for optimization problem mentioned earlier.
This produces the following result (the various solutions of the system above with variables x,y,λ,k in order):[(-1, 0, -sqrt(6)/2, 0), (-1, 0, sqrt(6)/2, 0), (0, -1, -sqrt(6)/2, 0), (0, -1, sqrt(6)/2, 0), (0, 0, 0, -1), (0, 0, 0, 1), (0, 1, -sqrt(6)*I/2, 0), (0, 1, sqrt(6)*I/2, 0), (1, 0, -sqrt(6)*I/2, 0), (1, 0, sqrt(6)*I/2, 0), (-sqrt(2)/2, -sqrt(2)/2, -2**(1/4)*sqrt(3)/2, 0), (-sqrt(2)/2, -sqrt(2)/2, 2**(1/4)*sqrt(3)/2, 0), (sqrt(2)/2, sqrt(2)/2, -2**(1/4)*sqrt(3)*I/2, 0), (sqrt(2)/2, sqrt(2)/2, 2**(1/4)*sqrt(3)*I/2, 0)]The capital ‘I’ in the solutions above refers to the square root of unity.
We want to reject these solutions since we require λ²≥0.
This means that the points that satisfy the KKT conditions are: (-1,0); (0,-1); (0,0); (-1/sqrt(2),-1/sqrt(2)).
As mentioned earlier, the points (-1,0) (corresponding to t=π) and (0,-1) (corresponding to t=3π/2) are the minima.
(0,0) and (-1/sqrt(2),-1/sqrt(2)) are saddle points that are also caught in the net.
But note that none of the local maxima are caught.
I’ll leave you with a small challenge.
Change the code above so that it catches the maxima instead of the minima.