The joint density distribution of two independent non-Gaussian signals will be uniform on a square; see upper left plot in Figure 2 below.

Mixing these two signals with an orthogonal matrix will result in two signals that are now not independent anymore and have a uniform distribution on a parallelogram; see upper right plot in Figure 2.

Which means that if we are at the minimum or maximum value of one of our mixed signals we know the value of the other signal.

Therefore they are not independent anymore.

Doing the same with two Gaussian signals will result in something else (see lower panel of Figure 2).

The joint distribution of the source signals is completely symmetric and so is the joint distribution of the mixed signals.

Therefore it does not contain any information about the mixing matrix, the inverse of which we want to calculate.

It follows that in this case the ICA algorithm will fail.

Figure 2: Gaussian and non-Gaussian sources and their mixturesSo in summary for the ICA algorithm to work the following preconditions need to be met: Our sources are a (1) lineare mixture of (2) independent, (3) non-Gaussian signals.

So lets quickly check if our test signals from above meet these preconditions.

In the left plot below we see the sine wave signal plottet against the saw tooth signal while the color of each dot represents the noise component.

The signals are distributed on a square as expected for non-Gaussian random variables.

Likewise the mixed signals form a parallelogram in the right plot of Figure 3 which shows that the mixed signals are not independent anymore.

Figure 3: Scatter plots of source and mixed signalsPre-processing stepsNow taking the mixed signals and feeding them directly into the ICA is not a good idea.

To get an optimal estimate of the independent components it is advisable to do some pre-processing of the data.

In the following the two most important pre-processing techniques are explained in more detail.

CenteringThe first pre-processing step we will discuss here is centering.

This is a simple subtraction of the mean from our input X.

As a result the centered mixed signals will have zero mean which implies that also our source signals s are of zero mean.

This simplifies the ICA calculation and the mean can later be added back.

The centering function in Python looks as follows.

>>> def center(x):>>> return x – np.

mean(x, axis=1, keepdims=True)WhiteningThe second pre-processing step that we need is whitening of our signals X.

The goal here is to linearly transform X so that potential correlations between the signals are removed and their variances equal unity.

As a result the covariance matrix of the whitened signals will be equal to the identity matrix:Where I is the identity matrix.

Since we also need to calculate the covariance during the whitening procedure we will write a small Python function for it.

>>> def covariance(x):>>> mean = np.

mean(x, axis=1, keepdims=True)>>> n = np.

shape(x)[1] – 1>>> m = x – mean>>> return (m.

dot(m.

T))/nThe code for the whitening step is shown below.

It is based on the Singular Value Decomposition (SVD) of the covariance matrix of X.

If you are interested in the details of this procedure I recommend this article.

>>> def whiten(x):>>> # Calculate the covariance matrix>>> coVarM = covariance(X) >>> # Singular value decoposition>>> U, S, V = np.

linalg.

svd(coVarM) >>> # Calculate diagonal matrix of eigenvalues>>> d = np.

diag(1.

0 / np.

sqrt(S)) >>> # Calculate whitening matrix>>> whiteM = np.

dot(U, np.

dot(d, U.

T)) >>> # Project onto whitening matrix>>> Xw = np.

dot(whiteM, X) >>> return Xw, whiteMImplementation of the FastICA AlgorithmOK, now that we have our pre-processing functions in place we can finally start implementing the ICA algorithm.

There are several ways of implementing the ICA based on the contrast function that measures independence.

Here we will use an approximation of negentropy in an ICA version called FastICA.

So how does it work?.As discussed above one precondition for ICA to work is that our source signals are non-Gaussian.

An interesting thing about two independent, non-Gaussian signals is that their sum is more Gaussian than any of the source signals.

Therefore we need to optimize W in a way that the resulting signals of Wx are as non-Gaussian as possible.

In order to do so we need a measure of gaussianity.

The simplest measure would be kurtosis, which is the fourth moment of the data and measures the “tailedness” of a distribution.

A normal distribution has a value of 3, a uniform distribution like the one we used in Figure 2 has a kurtosis < 3.

The implementation in Python is straight forward as can be seen from the code below which also calculates the other moments of the data.

The first moment is the mean, the second is the variance, the third is the skewness and the fourth is the kurtosis.

Here 3 is subtracted from the fourth moment so that a normal distribution has a kurtosis of 0.

>>> def kurtosis(x):>>> n = np.

shape(x)[0]>>> mean = np.

sum((x**1)/n) # Calculate the mean>>> var = np.

sum((x-mean)**2)/n # Calculate the variance>>> skew = np.

sum((x-mean)**3)/n # Calculate the skewness>>> kurt = np.

sum((x-mean)**4)/n # Calculate the kurtosis>>> kurt = kurt/(var**2)-3>>> return kurt, skew, var, meanFor our implementation of ICA however we will not use kurtosis as a contrast function but we can use it later to check our results.

Instead we will use the following contrast function g(u) and its first derivative g’(u):The FastICA algorithm uses the two above functions in the following way in a fixed-point iteration scheme:So according to the above what we have to do is to take a random guess for the weights of each component.

The dot product of the random weights and the mixed signals is passed into the two functions g and g’.

We then subtract the result of g’ from g and calculate the mean.

The result is our new weights vector.

Next we could directly divide the new weights vector by its norm and repeat the above until the weights do not change anymore.

There would be nothing wrong with that.

However the problem we are facing here is that in the iteration for the second component we might identify the same component as in the first iteration.

To solve this problem we have to decorrelate the new weights from the previously identified weights.

This is what is happening in the step between updating the weights and dividing by their norm.

In Python the implementation then looks as follows:>>> def fastIca(signals, alpha = 1, thresh=1e-8, iterations=5000):>>> m, n = signals.

shape>>> # Initialize random weights>>> W = np.

random.

rand(m, m)>>> for c in range(m):>>> w = W[c, :].

copy().

reshape(m, 1)>>> w = w/ np.

sqrt((w ** 2).

sum())>>> i = 0>>> lim = 100>>> while ((lim > thresh) & (i < iterations)):>>> # Dot product of weight and signal>>> ws = np.

dot(w.

T, signals)>>> # Pass w*s into contrast function g>>> wg = np.

tanh(ws * alpha).

T>>> # Pass w*s into g'>>> wg_ = (1 – np.

square(np.

tanh(ws))) * alpha>>> # Update weights wNew = (signals * wg.

T).

mean(axis=1) – >>> wg_.

mean() * w.

squeeze()>>> # Decorrelate weights >>> wNew = wNew – np.

dot(np.

dot(wNew, W[:c].

T), W[:c])>>> wNew = wNew / np.

sqrt((wNew ** 2).

sum())>>> # Calculate limit condition>>> lim = np.

abs(np.

abs((wNew * w).

sum()) – 1) >>> # Update weights>>> w = wNew >>> # Update counter>>> i += 1>>> W[c, :] = w.

T>>> return WSo now that we have all the code written up, lets run the whole thing!>>> # Center signals>>> Xc, meanX = center(X)>>> # Whiten mixed signals>>> Xw, whiteM = whiten(Xc)>>> # Run the ICA to estimate W>>> W = fastIca(Xw, alpha=1)>>> #Un-mix signals using W>>> unMixed = Xw.

T.

dot(W.

T)>>> # Subtract mean from the unmixed signals>>> unMixed = (unMixed.

T – meanX).

TThe results of the ICA are shown in Figure 4 below where the upper panel represents the original source signals and the lower panel the independent components retrieved by our ICA implementation.

And the result looks very good.

We got all three sources back!Figure 4: Results of the ICA analysis.

Above true sources; below recovered signals.

So finally lets check one last thing: The kurtosis of the signals.

As we can see in Figure 5 all of our mixed signals have a kurtosis of ≤ 1 whereas all recovered independent components have a kurtosis of 1.

5 which means they are less Gaussian than their sources.

This has to be the case since the ICA tries to maximize non-Gaussianity.

Also it nicely illustrates the fact mentioned above that the mixture of non-Gaussian signals will be more Gaussian than the sources.

Figure 5: Kernel Density Estimates of the three mixed and source signals.

So to summarize: We saw how ICA works and how to implement it from scratch in Python.

Of course there are many Python implementations available that can be directly used.

However it is always advisable to understand the underlying principle of the method to know when and how to use it.

If you are interested in diving deeper into ICA and learn about the details I recommend this paper by Aapo Hyvärinen and Erkki Oja, 2000.

Otherwise you can check out the complete code here, follow me on Twitter or connect via LinkedIn.

The code for this project can be found on Github.

akcarsten/Independent_Component_AnalysisFrom scratch Python implementation of the fast ICA algorithm – akcarsten/Independent_Component_Analysisgithub.

com.