Behind The Models: Cholesky Decomposition

Now the cool part: using Cholesky decomposition we can solve systems of equations of any size in 2 steps.

Suppose we want to solve for a, b, and c: remember that this could also be written as a system of 3 equations.

A * x = b => solve for xOrdinarily we’d stack the 3 equations on top of each other, solve for one variable dependent on the other two, plug it in, etc.

Using Cholesky decomposition, we have a much more efficient method available.

Solving for x using Cholesky DecompositionA 3×3 matrix is a little underwhelming, but we can already begin to appreciate the efficiency of this method on a very large matrix.

Applications — Least-Squares RegressionA linear regression takes the form Y = X * β, where Y is a vector of dependent variables, and X is a vector of independent variables.

Least-squares regression refers to finding the vector β where the squared differences between the predicted and actual Y values (i.

e.

RSS = “Residual Sum of Squares” or “Sum of Squared Errors”) are minimized.

If that doesn’t make sense, focus on this one take-away: the Cholesky decomposition is roughly twice as efficient as other methods for solving systems of linear equations.

Applications — Monte-Carlo SimulationI saved the coolest application for last.

Imagine that you want to generate many correlated normal random variables, but don’t want to deal with a massive multi-variate normal.

Cholesky decomposition allows you to simulate uncorrelated normal variables and transform them into correlated noraml variables — cool!Assume 3 Normal(0,1) random variables we want to follow the covariance matrix below, representing the underlying correlation and standard deviation matrices:We find the Cholesky decomposition of the covariance matrix, and multiply that by the matrix of uncorrelated random variables to create correlated variables.

import numpy as npimport pandas as pdimport seaborn as snsimport matplotlib.

pyplot as pltx_uncor = np.

random.

normal(0, 1, (3, 10000))cov = np.

array([[ 10.

0, -2.

00, 2.

00], [-2.

00, 20.

00, 0.

50], [ 2.

00, 0.

50, 0.

50]])L = np.

linalg.

cholesky(cov)x_cor = np.

dot(L, x_uncor)corr_simulated = pd.

DataFrame(x_cor).

T.

corr()std_ = np.

sqrt(np.

diag(cov))corr_empirical = cov / np.

outer(std_, std_)x_uncor = pd.

DataFrame(x_uncor.

T)x_cor = pd.

DataFrame(x_cor.

T)sns.

pairplot(x_uncor, kind="reg")sns.

pairplot(x_cor, kind="reg")Voila!.We go from uncorrelated:To correlated:Consistent with the correlation and standard deviation matrices presented above, columns 0 and 2 have a strongly positive correlation, 0 and 1 slightly negative, 1 and 2 slightly positive.

The standard deviation of variable 2 is contained, while 0 and 1 are much wider.

Note that this does not work in the same way for non-normal random variables.

In the above example, our correlated variables maintained a normal distribution.

If we apply this method to gamma-generated random variables we see that the process does not hold.

Uncorrelated Gamma(1, 5) — everything looks good.

And Correlated:Here we see that the variables are no longer gamma distributed — this is most obviously clue is that variable 1 takes on negative values while the gamma distribution is strictly positive.

There is an easy back-door approximation that involves simulating correlated random variables, finding their inverse, and then drawing from the desired distribution using the inverse-correlated-normal values.

This is not exact, but can get the job done.

Exact methods tend to be fairly exotic.

ConclusionCholesky decomposition is a neat trick that underlies many machine learning applications, two of which were featured here: least-squares regression and Monte-Carlo simulation using correlated normal variables.

While linear algebra can be a little scary, it’s important to remember that it is just an efficient method for notating systems of linear equations, and that a high-level understanding of linear algebraic methods is crucial to understanding current machine learning algorithms.

.