In this case, SVMs and Naive Bayes would not be a good choice since they assume that the input features are independent.

The k-NN algorithm could still work however it relies on the notion of a similarity measure between input examples.

Now the question becomes how do we measure the similarity between two time series?How about Euclidean distance?The Euclidean distance between two time series Q and C of length n is defined asAt first glance, it seems like simply calculating the Euclidean distance between two time series would give us a good idea of the similarity between them.

After all, the Euclidean distance between identical time series is zero and the Euclidean distance between very different time series is large.

However, before we settle on Euclidean distance as a similarity measure we should clearly state our desired criteria for determining the similarity between two time series.

With a good similarity measure, small changes in two time series should result in small changes in their similarity.

With respect to Euclidean distance this is true for changes in the y-axis, but it is not true for changes in the time axis (i.

e.

compression and stretching).

Consider the following example.

import pandas as pdimport numpy as npimport matplotlib.

pylab as pltx=np.

linspace(0,50,100)ts1=pd.

Series(3.

1*np.

sin(x/1.

5)+3.

5)ts2=pd.

Series(2.

2*np.

sin(x/3.

5+2.

4)+3.

2)ts3=pd.

Series(0.

04*x+3.

0)ts1.

plot()ts2.

plot()ts3.

plot()plt.

ylim(-2,10)plt.

legend(['ts1','ts2','ts3'])plt.

show()In the above example, it is clear that ts1 and ts2 are most similar (they are both sin functions under different transformations).

ts3 is clearly the most different.

Let’s compute the Euclidean distance d(ts1,ts2) and d(ts1,ts3) to see if the Euclidean distance measure agrees with what our intuition tells us.

Let’s first create a function that computes the Euclidean distance between two time series.

def euclid_dist(t1,t2): return sqrt(sum((t1-t2)**2))Let’s now find the Euclidean distance between ts1 and ts2.

print euclid_dist(ts1,ts2)26.

959216038and the Euclidean distance between ts1 and ts3print euclid_dist(ts1,ts3)23.

1892491903This is not good because according to the Euclidean distance measure, ts1 is more similar to ts3 than to ts2 which contradicts our intuition.

This is the problem with using the Euclidean distance measure.

It often produced pessimistic similarity measures when it encounters distortion in the time axis.

The way to deal with this is to use dynamic time warping.

Dynamic Time WarpingDynamic time warping finds the optimal non-linear alignment between two time series.

The Euclidean distances between alignments are then much less susceptible to pessimistic similarity measurements due to distortion in the time axis.

There is a price to pay for this, however, because dynamic time warping is quadratic in the length of the time series used.

Dynamic time warping works in the following way.

Consider two time series Q and C of the same length n whereandThe first thing we do is construct an n×n matrix whose i, j^{th} element is the Euclidean distance between q_i and c_j.

We want to find a path through this matrix that minimizes the cumulative distance.

This path then determines the optimal alignment between the two time series.

It should be noted that it is possible for one point in a time series to be mapped to multiple points in the other time series.

Let’s call the path W wherewhere each element of W represents the distance between a point i in Q and a point j in C i.

e.

So we want to find the path with the minimum Euclidean distanceThe optimal path is found via dynamic programming, specifically the following recursive function.

def DTWDistance(s1, s2): DTW={} for i in range(len(s1)): DTW[(i, -1)] = float('inf') for i in range(len(s2)): DTW[(-1, i)] = float('inf') DTW[(-1, -1)] = 0 for i in range(len(s1)): for j in range(len(s2)): dist= (s1[i]-s2[j])**2 DTW[(i, j)] = dist + min(DTW[(i-1, j)],DTW[(i, j-1)], DTW[(i-1, j-1)]) return sqrt(DTW[len(s1)-1, len(s2)-1])Now let’s compute the Euclidean distance between ts1 and ts2 using dynamic time warping.

print DTWDistance(ts1,ts2)17.

9297184686and now the dynamic time warping distance between ts1 and ts3print DTWDistance(ts1,ts3)21.

5494948244As you can see, our results have changed from when we only used the Euclidean distance measure.

Now, in agreement with our intuition, ts2 is shown to be more similar to ts1 than ts3 is.

Speeding Up Dynamic Time WarpingDynamic time warping has a complexity of O(nm) where n is the length of the first time series and mm is the length of the second time series.

If you are performing dynamic time warping multiple times on long time series data, this can be prohibitively expensive.

However, there are a couple of ways to speed things up.

The first is to enforce a locality constraint.

This works under the assumption that it is unlikely for q_i and c_j to be matched if i and j are too far apart.

The threshold is determined by a window size w.

This way, only mappings within this window are considered which speeds up the inner loop.

The following is the modified code which includes the window size w.

def DTWDistance(s1, s2,w): DTW={} w = max(w, abs(len(s1)-len(s2))) for i in range(-1,len(s1)): for j in range(-1,len(s2)): DTW[(i, j)] = float('inf') DTW[(-1, -1)] = 0 for i in range(len(s1)): for j in range(max(0, i-w), min(len(s2), i+w)): dist= (s1[i]-s2[j])**2 DTW[(i, j)] = dist + min(DTW[(i-1, j)],DTW[(i, j-1)], DTW[(i-1, j-1)]) return sqrt(DTW[len(s1)-1, len(s2)-1])Let’s test this faster version.

print DTWDistance(ts1,ts2,10)18.

5965518384print DTWDistance(ts1,ts3,10)22.

4724828468Another way to speed things up is to use the LB Keogh lower bound of dynamic time warping.

It is defined aswhere U_i and L_i are upper and lower bounds for time series Q which are defined asandfor a reach r and I(⋅) is the indicator function.

It can be implemented with the following function.

def LB_Keogh(s1,s2,r): LB_sum=0 for ind,i in enumerate(s1): lower_bound=min(s2[(ind-r if ind-r>=0 else 0):(ind+r)]) upper_bound=max(s2[(ind-r if ind-r>=0 else 0):(ind+r)]) if i>upper_bound: LB_sum=LB_sum+(i-upper_bound)**2 elif i<lower_bound: LB_sum=LB_sum+(i-lower_bound)**2 return sqrt(LB_sum)Let’s now test on ts1 and ts2:print LB_Keogh(ts1,ts2,20)6.

25389235159and now ts1 and ts3.

print LB_Keogh(ts1,ts3,20)19.

9595478694The LB Keogh lower bound method is linear whereas dynamic time warping is quadratic in complexity which makes it very advantageous for searching over large sets of time series.

Classification and ClusteringNow that we have a reliable method to determine the similarity between two time series, we can use the k-NN algorithm for classification.

Empirically, the best results have come when k=1.

The following is the 1-NN algorithm that uses dynamic time warping Euclidean distance.

In this algorithm, train is the training set of time series examples where the class that the time series belongs to is appended to the end of the time series.

test is the test set whose corresponding classes you are trying to predict.

In this algorithm, for every time series in the test set, a search must be performed through all points in the training set so that the most similar point is found.

Given that dynamic time warping is quadratic, this can be very computationally expensive.

We can speed up classification using the LB Keogh lower bound.

Computing LB Keogh is much less expensive than performing dynamic time warping.

And since LBKeogh(Q, C)≤DTW(Q, C), we can eliminate time series that cannot possibly be more similar than the current most similar time series.

In this way, we are eliminating much unnecessary dynamic time warping computations.

from sklearn.

metrics import classification_reportdef knn(train,test,w): preds=[] for ind,i in enumerate(test): min_dist=float('inf') closest_seq=[] #print ind for j in train: if LB_Keogh(i[:-1],j[:-1],5)<min_dist: dist=DTWDistance(i[:-1],j[:-1],w) if dist<min_dist: min_dist=dist closest_seq=j preds.

append(closest_seq[-1]) return classification_report(test[:,-1],preds)Now let’s test it on some data.

We will use a window size of 4.

Although the code is sped up with the use of the LB Keogh bound and the dynamic time warping locality constraint, it may still take a few minutes to run.

train = np.

genfromtxt('datasets/train.

csv', delimiter=' ')test = np.

genfromtxt('datasets/test.

csv', delimiter=' ')print knn(train,test,4)precision recall f1-score support 1 1.

00 0.

96 0.

98 50 2 0.

96 1.

00 0.

98 50 3 1.

00 1.

00 1.

00 50 4 0.

98 1.

00 0.

99 50 5 1.

00 1.

00 1.

00 50 6 1.

00 0.

98 0.

99 50avg / total 0.

99 0.

99 0.

99 300The same idea can also be applied to k-means clustering.

In this algorithm, the number of clusters is set apriori and similar time series are clustered together.

import randomdef k_means_clust(data,num_clust,num_iter,w=5): centroids=random.

sample(data,num_clust) counter=0 for n in range(num_iter): counter+=1 print counter assignments={} #assign data points to clusters for ind,i in enumerate(data): min_dist=float('inf') closest_clust=None for c_ind,j in enumerate(centroids): if LB_Keogh(i,j,5)<min_dist: cur_dist=DTWDistance(i,j,w) if cur_dist<min_dist: min_dist=cur_dist closest_clust=c_ind if closest_clust in assignments: assignments[closest_clust].

append(ind) else: assignments[closest_clust]=[] #recalculate centroids of clusters for key in assignments: clust_sum=0 for k in assignments[key]: clust_sum=clust_sum+data[k] centroids[key]=[m/len(assignments[key]) for m in clust_sum] return centroidsLet’s test it on the entire data set (i.

e.

the training set and the test set stacked together).

train = np.

genfromtxt('datasets/train.

csv', delimiter=' ')test = np.

genfromtxt('datasets/test.

csv', delimiter=' ')data=np.

vstack((train[:,:-1],test[:,:-1]))import matplotlib.

pylab as pltcentroids=k_means_clust(data,4,10,4)for i in centroids: plt.

plot(i)plt.

show().