Clean up your time series data with a Hampel filter

threshold, which is the standard value that people use.

I came up with 5 just through experimentation.

The ind attribute gives us the outlier indices.

The output:42 50 113 145 161Looking at the outliers I actually created, we can see that it caught all of them, but it caught some others too:SP.

hampel.

outlier.

times <- vector()SP.

hampel.

outlier.

values <- vector()i <- 1for (ind in SP.

hampel\$ind) { SP.

hampel.

outlier.

times[i] <- SP.

times[ind] SP.

hampel.

outlier.

values[i] <- SP.

with.

outliers[ind] i <- i + 1}options(repr.

plot.

width=8, repr.

plot.

height=4)plot(SP.

with.

outliers, col="dodgerblue2")points(SP.

hampel.

outlier.

times, SP.

hampel.

outlier.

values, pch=4, col="red")grid()This looks pretty good.

The outliers for 42 and 50 came up just because they appeared in pretty flat areas of the chart.

That’s fine; it won’t hurt to replace them with what are likely to be very similar values.

Let’s do that now.

Replacing the outliers with reasonable valuesIt turns out that the hampel function returns not only the anomalous indices, but also a series with all the outliers replaced by so-called “imputed” values.

We can use the y attribute for this:plot(SP, lwd=1.

5, col="red")lines(SP.

hampel\$y, lwd=1.

5, col="dodgerblue2")Original (red) vs imputed (blue) valuesHere, the original series is red, and the Hampel series is blue.

As you can see, the imputed values are very reasonable, and the Hampel output provides a better basis for model fitting than the time series containing outliers.

ConclusionIn this post we learned how to apply Hampel filters to identify outliers in time series and replace them with imputed values.

Discovering the right window width for the filter might take some trial and error, but with some exploration you should be able to get pretty good results for many time series.

Though our example focused on a time series with trend and no seasonality, we can apply the filter to time series with seasonality too.

For example, here’s how the filter performs on beer sales data:The Hampel filter applied to seasonal beer sales dataI created a separate notebook for applying the Hampel filter to beer sales: https://github.

com/williewheeler/imputation-experiments/blob/master/hampel.

ipynb.