The Angles From Which We SeeDemystify PCA to a non-technical audienceAileenBlockedUnblockFollowFollowingJun 7Imagine you are an artist and want to draw a house on a piece of paper.
The essence of this task is to convert an object from a 3D world to a 2D plane.
If you are free to move around this house, which angle would you pick to make the drawing?Two different angles to look at the houseWhat we want is the best angle to draw the house, in such a way that the most important information about this house — such as the position of the window, the ratio of the height to the width — can be well captured.
So we will probably choose to “project” the house onto the paper from an angle as shown on the left, rather than as shown on the right.
In machine learning, we deal with a similar problem.
The real-world dataset often has too many dimensions (features) to be visualized effectively.
For example, the purchasing behavior of a customer can be attributed to many factors, such as income, age, societal values, and a lot more.
And each of those factors would give a new dimension in the dataset.
It is however very difficult for humans to visually imagine a dataset of more than 3 dimensions.
Therefore, in order for the domain experts to look at the data and extract meaningful conclusions from them, we often need to reduce a dataset into a few major components.
This process of reducing dimensions can be achieved by Principal Component Analysis (PCA), a widely used approach in unsupervised learning.
To illustrate, we look at a two-dimensional dataset as plotted below on an x-y plane, and see how we go about reducing it to one dimension — which is a line.
Two angles to project the data onto a lineThere are different ways of finding the right projections mathematically.
Essentially, we are looking for a line that has the minimum average (mean-squared) distance to all data points.
By choosing the line on the left, most variations of the data will be kept in the new dimension.
Therefore, the key idea of PCA is to find out the best angle to which we project the data.
In practical use cases, this can mean reducing the total number of features from thousands to hundreds, significantly downsizing the dataset.
Here, however, we will look at a rather simple use case on customer positioning.
Suppose we want to understand our customers to derive insight into the type of product recommendations we offer.
One possibility is to look at their purchase records to understand the personal traits underlying their purchase behaviors.
We might conclude some customers are more price conscious, and some are faster tech adopters.
Customers’ speed for tech adoptions vs price sensitivityIntuitively, a customer’s price sensitivity and speed for tech adoptions can be reduced to one dimension that in a way captures the overall attractiveness of the customer for a high tech firm.
Thus, we can represent each customer on a new red line, where the customers located to the right of the red line are more attractive than those located to the left.
Customer attractiveness = weight1 × p + weight2 × t + cIn general, each principal component is a linear combination of the original features.
Likewise, the customer attractiveness here is a weighted sum of price sensitivity (p) and tech adoptions (t), plus some constant value (c).
In a complicated world as we live in, we constantly make choices on the right angle to perceive things.
From some angles, we perceive something meaningful, and from some others, we do not.
Just like an artist chooses an angle to draw a house, PCA is about the choice of that right angle from which we see.