Well, there are several methods to perform anomaly detection such as Density based anomaly detection, Clustering based anomaly detection and Support Vector Machine based anomaly detection.
Here, we will discuss a simple method based on moving averages that detects points beyond a fixed standard deviations from the moving average as outliers in a time series data.
Traversing mean over time-series data isn’t exactly trivial, as it’s not static.
You would need a rolling window to compute the average across the data points.
Technically, this is called a rolling average or a moving average, and it’s intended to smooth short-term fluctuations and highlight long-term ones.
Mathematically, an n-period simple moving average can also be defined as a “low pass filter”.
You can find a very intuitive explanation of it here.
Lets take a look at the algorithm :First, we define a function that calculates the moving average on our time series data for a fixed window size using discrete linear convolutions.
Convolution is a mathematical operation that could be described as the integral of the product of two functions, after one is reversed and shifted.
In this case, the operation calculates the average of each sliding window.
After calculating the moving average, we calculate the difference between the y values and the corresponding average which we will call residuals.
Next, we calculate the standard deviation of these residuals.
We define the outlier or anomalies as the points whose corresponding residual is greater than the moving average plus three standard deviations or less than the moving average minus three standard deviations.
We have taken three standard deviations as an arbitrary choice which can vary based on the distribution of the data and our use case.
Lets take a look at the implementation in python:Here, I am trying to find anomalies in a sample data for acceleration vs time which will give us unusual behavior in the acceleration of the vehicle.
These can mean sudden braking or increase in acceleration which can be due to a possible obstacle, collision or a steep turn etc.
The following results were obtained on the sample data set using a rolling window of 20 :The red dots are the outliers and the green line is the mean value of the moving windows.
Now, these outliers have to be further analysed to know whether they correspond to just noise or critical events that we are interested in.
This algorithm gives us the points of interest from the entire duration of time series data that narrows down the search for us.
We can mine all such critical events from the huge amount of data we might have collected from customer/test cars and feed them to our autonomous vehicle systems and see how they respond to them.
This can be very useful in validating autonomous vehicles as critical events occur in less than 1% of the data we collect.