Yet Another Kalman Filter Explanation ArticleIt’s time to know where those updating equations come fromBen OgorekBlockedUnblockFollowFollowingMay 1“A Puzzle without a picture” (who needs a picture when you have the Kalman Filter updating equations?)There are many articles online that explain the Kalman Filter, and many of them are quite good.

They use autonomous robots and sensors as motivating examples, have nice visuals, and introduce the model equations and updates in an intuitive way.

But every one I’ve seen stops short of explaining the Kalman gain matrix and exactly where it comes from.

My favorite example is this highly regarded answer on the Mathematics Stack Exchange:An Explanation of the Kalman Filterbegingroup$ I am dare to post the link to my explanation: http://david.

wf/kalmanfilter/ There are a lot of different…math.

stackexchange.

comIt’s excellent, right up until the point where the author prepares to explain the Kalman gain matrix, and instead finishes with “to be continued…” (that was 2014).

The disappointed readers beg for days for the rest of the answer, but they never get it.

So welcome to yet another Kalman Filter explanation article, the distinction being that this one contains a “friendly” derivation of the updating equations, all the way up to the end.

It is a Bayesian explanation but requires only a cursory understanding of posterior probability, relying on two properties of the multivariate Gaussian rather than specific Bayesian results.

As a bonus, this explanation will show us how the Kalman Filter is “optimal” by the very nature of what it is.

The derivationDue to Medium’s typesetting limitations, the derivation is provided in an embedded notebook below.

My contribution is at best a concise presentation; I was lost until I found slides from Simo Särkkä, a researcher who has since written a book on the more general topic of Bayesian Filtering and Smoothing.

Reflecting on the derivationI hope you made you through the arguments above (feedback is appreciated about parts that did not flow well).

At the end, there is the posterior:The notation is slightly different than the updating equations on Wikipedia, but you can probably tell that the mean is the same as the “Updated (a posteriori) state estimate.

” The variance does not look similar to the “Updated (a posteriori) estimate covariance,” but note in the paragraph below how the formula “further simplifies” to something that does look similar.

OptimalityA “Bayes action” is a rule that “minimizes the posterior expected value of a loss function.

” The Bayes action that that minimizes the expected squared error loss of an unknown parameter, i.

e.

the mean squared error, is the mean of the posterior distribution.

In this article, the mean of the posterior distribution is the Kalman Filter updating equation for the state; this updating equation is a minimum mean squared error estimator because it is a posterior mean.

Final thoughtsDuring the derivation, I assumed that an existing Bayesian result would apply, simplifying it even further.

The measurement model looks like a regression with x_t as β, but I couldn’t get the Bayesian linear regression formulas to fit.

I did note that rows of the “design” matrix H are not of independent observations, and I couldn’t figure out a way to transform it into a multivariate regression with one observation either.

If there is a way shorten the derivation with an existing conjugate result, please share in the comments.

For notation, I specifically chose not to use the k|k-1 notation from the Wikipedia article because I found it easy to get mixed up about what quantity I was actually working with.

After working through the derivation, however, I can understand that it does allow for fewer symbols and emphasizes the evolution of the quantities during the update step.

Any time I get sidetracked on something like this (and I am sidetracked), there’s always the question of whether or not it was worth it.

In this case, I think it was.

I feel I have a better understanding of the multivariate Gaussian’s extensive role in the Kalman Filter’s construction.

I have a better concept of the Kalman Filter prior, which incorporates not only the information up to the previous state but also from the state-space model to go one step further.

I may have a better intuition for choosing starting values for the state vector and variance matrix in a Kalman Filter implementation (we’ll see).

I hope you think it was worth it as well, and either way let me know in the comments!.