An introduction to SVD and its widely used applicationsNathan ToubianaBlockedUnblockFollowFollowingJun 1If you’re in the data science world (or close to it), you probably already heard of SVD a thousand times even if you haven’t used it.
Whether it’s for PCA (Principal Components Analysis) or recommendation algorithms, SVD is a powerful technique widely used today in a lot of models — we’ll describe what it is and show how it’s used in some key methods.
Note: we won’t cover the code part but only the theory.
What’s SVD?SVD means Singular Value Decomposition.
The SVD of a matrix X of dimension n×d is given by:Where:U and V are squared orthogonal:D is diagonal of dimension d×nSome additional notes:D is not squaredThe SVD of a matrix can be done for any matrixSVD is different from the eigenvalue decomposition of a matrix.
Let’s define the eigenvalue decomposition of a matrix.
First, it exists for a matrix X if and only if X is squared and the eigenvectors form a base in the matrix dimension space.
If that’s the case, then one can write:where P is the matrix of the eigenvectors and Delta is a diagonal matrix of the eigenvalues of X — here, Delta is squared.
In some sense, SVD is a generalization of eigenvalue decomposition since it can be applied to any matrix.
SVD used in PCAPCA means Principal Components Analysis.
Given an input matrix X, it consists in finding components p_i that are linear combinations of the original coordinates:in such a way that:The components are orthogonal (E[p_ip_j]=0)The components are ordered in such a way that p_1 explains the largest percentage of variance, and p_2 has the second largest (that has not been explained by p_1) and so on.
We assume here that X is normalized (E(x_i)=0 and Var(x_i)=1 for each feature).
One can show that the components p are given bywhere Gamma is the matrix found when doing the eigenvalue decomposition of the variance-covariance matrix of X.
Note that this is possible because the variance-covariance matrix is symmetric and any symmetric matrix has an eigen value decomposition.
Also note that the otrhogonality of Gamma impliesNote also that since E(x)=0, E(p)=0.
Andwhere Delta is the matrix of the eigenvalues of the variance-covariance matrix of X placed in descending order.
The key thing to remember here is that the principal components of X are found through a linear transformation using the eigenvectors of the variance-covariance matrix.
Once we have the components, we can reduce our data by keeping only the top k (k being arbitrary) components.
But do we really need to compute this eigenvalue decomposition?No!.And that’s where SVD comes in.
Indeed, using the formulas of SVD above, the eigenvectors of the variance-covariance matrix are given by the matrix U of the SVD of X.
And the eigen values are the squares of the singular values from the SVD of X.
We can thus use SVD to perform PCA, and keep the top k singular values of X to approximate given the top k principal components.
In some common cases, doing PCA through SVD is numerically more efficient than through the eigenvalue decomposition of the variance-covariance matrix.
SVD used in matrix completionFor most recommendation algorithms, the input matrix being very sparse, matrix factorization methods are key.
SVD plays a central role in it.
The general matrix factorization problem formulated asUnder the constraint that the rank of M is less or equal than an arbitrary integer r.
One can prove that the solution of this problem iswhereand D_r is constructed from D by keeping just the first r singular values.
One can note that we can’t easily solve the above problem when the matrix X has missing values.
There are several methods to cope with that.
One of them is to start with an initial solution and to iterate by doing SVDs until convergence.
We won’t get into the details of the methods here, but the bottom line is that SVD can also be used in matrix completion, since it involves doing matrix factorization.