From a visual perspective, it feels pretty clear that there are two populations with two linear trends.

The two groups are mixed together into one undistinguished set of points.

Here’s another one: Can you separate the two components in this masterpiece, and identify the person in the picture?I’m not religious but this has me thinking…Again, this shouldn’t be a problem for us.

We can mentally separate the two images, but if someone asked for the two separated images, how would you actually do it?We want a mathematical framework for this process of separating a mixed dataset, and an algorithm to do so.

DefinitionBefore we think about this mathematical framework, let’s write it out in simple ideas.

We have data that is the combination of different parts.

We want to separate it into a simpler representation.

We want to have as little mixup as possible.

Suppose we have m initial points:that we would like to separate into n parts:We want to find a transformation that separates the initial points.

The simplest form for such a transformation is a n-by-m matrix W.

To ensure that the n outputs are separated, we can search for the transformation that makes the components are as different from the Normal distribution as possible.

If you’ve taken a Statistics class, then you’ve probably heard about the Normal distribution (Bell curve).

One of the most important results in all of Statistics is the Central Limit Theorem, which states that adding together a bunch of independent samples (from even a non-Normal distribution) results in a Normal distribution.

The different groups or images that are all combined form a collection of mixed up points that often are Normally distributed.

We wish to extract components that individually are as non-Normally distributed as possible.

KurtosisTwo common metrics for describing a probability distribution are the mean and standard deviation.

They represent the first and second moment of a distribution, and often encapsulate most of the information about the average and the spread.

They are defined as:We can also talk about the third moment of a distribution, called the kurtosis.

The kurtosis measures how skewed a distribution is.

We define it as:A Normal distribution has a kurtosis of 0.

If we maximize the kurtosis, we will end up with a distribution that is as non-Normal as possible.

SummaryIn this article, we motivated our natural ability to separate data that is the composition of multiple sources.

From scattered datasets to filtered images, these are easy for our brains to separate, but more interesting to think about from an algorithmic perspective.

With the idea of keeping each component as different as possible, we can start to build an algorithm.

Find the transformation that results in a distribution that has the maximum kurtosis.

We can update our current candidate by using gradient descent, and continue until we have separated our original data into different parts.

Thanks for reading!.Independent Component Analysis is a great tool and the entry point to solving a handful of interesting problems.

I’d love to hear your feedback and see if there is any interest in this topic — I barely scratched the surface.

.