This can be remedied by doing a non-linear expansion of the time series S first, then finding linear features of the expanded data..By doing this we find non-linear features of the original data.Let’s create a new multivariate time series by stacking time delayed copies of S on it:Next we do a cubic expansion of the data and extract the SFA features..A cubic expansion turns a 4 dimensional vector [a, b, c, d]ᵀ into the 34 element vector with elements t³, t²v, tvu, t², tv, t for distinct t, u, v ∈{a, b, c, d}.Keep in mind that the best number of time delayed copies to be added varies from problem to problem..Alternatively, if the original data is too high-dimensional then dimensionality reduction needs to be done, for example with Principal Component Analysis.Consider thus the following to be the hyperparameters of the method: the method of dimensionality expansion (reduction), the output dimension after expansion (reduction) and the number of slow features to be found.Now, after adding the time delayed copies, the length of the time series changed from 300 to 297..The corresponding length of the slow feature time series is thus 297 as well..For nicer visualization here, we turn it to length 300 by prepending the first value to it and appending the last value two times..The features found by SFA have zero mean and unit variance, so we normalize D as well before visualizing the results.Even considering only 300 data points, the SFA features manage to almost completely recover the underlying source – which is quite impressive!2..So what’s going on under the hood?Theoretically, the SFA algorithm accepts as input a (multivariate) time series X and an integer m indicating the number of features to extract from the series, where m is less than the dimension of the time series..The algorithm determines m functionssuch that the average of the squared time derivative of two successive time points of each yᵢ is minimized..Intuitively, we want to maximize the slowness of the features:where the dot indicates the time derivatives, in the discrete case:The objective function (1) measures the slowness of the feature..The zero-mean constraint (2) makes the second moment and variance of the features equivalent and simplifies the notation..The unit variance constraint (3) discards constant solutions.The final constraint (4) decorrelates our features and induces an ordering on their slowness.. More details