The Kalman Filter and External Control InputsBen OgorekBlockedUnblockFollowFollowingJun 19Block diagram of a control system by Wikimedia user Orzetto (CC BY-SA 4.
This article features a “controller” to the system, though not one designed to match a reference value.
In this article, you willUse the statsmodels Python module to implement a Kalman Filter model with external control inputs,Use Maximum Likelihood to estimate unknown parameters in the Kalman Filter model matrices,See how cumulative impact can be modeled via the Kalman Filter.
(This article uses the fitness-fatigue model of athletic performance as an example and doubles as Modeling Cumulative Impact Part IV.
)The Kalman Filter with control InputsThe following is a specification of the Kalman Filter model with external “control” input B u_t:where q_t ∼ N(0, ????) and r_t ∼ N(0, ????).
The model matrices A, B, H, Q, and R may contain unknown parameters and are often allowed to vary through time.
The external “control-vector” input, u_t, must be known for all time points up to the present and — if the task requires predicting multiple time steps ahead — the future as well.
Presentations of the Kalman Filter from a statistician’s vantage point tend to zero the external control term out, since its not needed for the most popular dynamic time series models (where the internal system dynamics are the only forces at play).
This holds true for statistical software as well, and we’ll see the lack of an explicit control specification mechanism in statsmodels.
Fortunately, statsmodels does offer a state space model intercept that can be specified uniquely at each time point, and this is sufficient to incorporate external control inputs.
Using control inputs with statsmodelsTake a moment to familiarize yourself with the statsmodels statespace representation, which uses slightly different notation for the state equation than does this article.
Its state equation is written as:The “state_intercept” c_t, which was not included in our specification, is zero by default in statsmodels.
The description reads “c : state_intercept (k_states x nobs)” meaning that the user is free to specify a different value for the state intercept at every time point.
(This is true for all statsmodels Kalman Filter model matrices.
) Thus, we set:for t=1…T and have a Kalman Filter model with control inputs.
Example: the fitness-fatigue model in state space formThe fitness-fatigue model from Modeling Cumulative Impact Part I is:where p_t is (athletic) performance and w_t is training “dose” (time-weighted training intensity) at time t.
In previous articles in the Modeling Cumulative Impact series, convolutions of training history with impulse response functions have been the sole mechanism of representing the cumulative impact of the training sessions.
In this article a second tool will be used: the system state.
In order to do that, we must put the fitness-fatigue model in state-space form, with training dose as the external control input.
In the above equation, the first convolution sum represents athletic fitness, which I’ll now call h_t.
Below is the result of lagging one time step:Separating the final term in the convolution sum defining h_t, we arrive at the recursion:This argument proceeds in the same manner for fatigue’s convolution sum, called g_t henceforth, and the recursive relationship for both fitness and fatigue can be expressed in the following “state-space form”:We can continue using matrices to express the second “measurement” stage of the model with:where r_t ~ N(0, σ²).
The control input is lagged by one time period, which we’ll have to account for, but otherwise we have a typical state-space formulation of the fitness-fatigue model with exogenous control inputs.
Combined with the measurement model of performance given the current state of fitness and fatigue, the Kalman Filter toolkit (state estimation, easy imputation, and likelihood evaluation) is at our disposal.
This section will use simulated data that can be reproduced from the R gist that was used in Modeling Cumulative Impact Part II, which is also available to the reader as a csv file.
To run the code below, change the file path in the following Python code block:import numpy as npimport pandas as pdimport matplotlib.
pyplot as pltimport statsmodels as smfrom statsmodels.
mlemodel import MLEModeltrain_df = pd.
head()This block loads the required dependencies and prints a few lines of our input data set train_df:day day_of_week period w perf0 1 0 build-up 10 489.
1973631 2 1 build-up 40 500.
5453122 3 2 build-up 42 479.
8866483 4 3 build-up 31 474.
2268654 5 4 build-up 46 459.
322820To create a Kalman Filter model in statsmodels, we must extend the MLEModel base class (from the mlemodel module).
class FitnessFatigue(MLEModel): start_params = [20, 5, .
15, 400, 35] param_names = ['tau_1', 'tau_2', 'k1', 'k2', 'int', 'sigma2_e'] def __init__(self, perf, training): super().
__init__(perf, k_states = 2) self.
training_lag1 = (training .
shift(periods = 1, fill_value = 0).
loglikelihood_burn = 15 #self.
array([0, 0]), 2 * np.
eye(2)) def exp_decay(self, tau): return np.
0 / tau) def update(self, params, **kwargs): params = super().
update(params, **kwargs) # Preparing quantities decay_vec = np.
exp_decay(params)], (2, 1)) input_lag1_vecT = np.
training_lag1, (1, -1)) # state space model self['transition'] = np.
squeeze(decay_vec)) self['state_intercept'] = np.
matmul(decay_vec, input_lag1_vecT) # measurement model self['design', 0, 0] = params self['design', 0, 1] = params self['obs_intercept', 0, 0] = params self['obs_cov', 0, 0] = paramsA few things to note about the class above:We had to put in starting values in start_params.
Note that they are not especially close to the true values.
We have to specify an __init__ method that will accept data.
In this case it must accept both performance measurements and training (i.
, control) data.
Note the creation of the lag training variable within __init__.
The fitness-fatigue model specifically specifies a one time period lag before a training event can impact fitness or fatigue.
We’re able to avoid specifying the selection matrix which premultiplies the state error term in the statsmodels representation.
That defaults to a matrix of zeros that we’d typically be in trouble for not setting, but the fitness-fatigue model, when directly translated to state space form, has no state error term.
Next, instantiate the object with the data and estimate the unknown parameters using maximum likelihood:my_FitnessFatigue = FitnessFatigue(train_df.
w) mle_results = my_FitnessFatigue.
fit(method = 'bfgs', maxiter = 1000)mle_results.
summary()The last command produces the following output (spacing has been modified):Statespace Model Results===================================================================Dep.
Variable: perf No.
Observations: 259Model: FitnessFatigue Log Likelihood -826.
457Date: Tue, 18 Jun 2019 AIC 1664.
915Time: 07:32:59 BIC 1685.
898Sample: 0 HQIC 1673.
366 – 259Covariance Type: opg================================================================== coef std err z P>|z| [0.
961==================================================================Ljung-Box (Q): 38.
86 Jarque-Bera (JB): 0.
52 Prob(JB): 0.
83Heteroskedasticity (H): 1.
29 Skew: 0.
07Prob(H) (two-sided): 0.
25 Kurtosis: 2.
87===================================================================Warnings: Covariance matrix calculated using the outer product of gradients (complex-step).
The output shows that the Kalman Filter has done a good job recovering parameter values used in the simulation.
Standard errors, which required a custom nonlinear approximation in Modeling Cumulative Impact Part III, now are available “out of the box.
” On the other hand, the maximum likelihood procedure did require some tuning in this situation; increasing the minimum number of iterations and choosing the BFGS method led to a stable fit.
The reader is encouraged to repeat the exercise above with the “approximate diffuse” initialization replaced by the known initialization (currently commented out).
Unlike in The Kalman Filter and Maximum Likelihood, the results using the known initialization are somewhat different, especially the standard errors.
When the known initialization is used, the standard error estimates from the Kalman Filter are similar to those estimated using the nonlinear approximation.
With the approximate diffuse initialization, they are considerably larger for some parameters (especially the “time constants” in the exponentials).
The Kalman Filter gives us access to both filtered state estimates, which use only the data available up to a particular time point, and smoothed state estimates, which incorporate all available data into each time point’s state estimate.
Below we’ll visualize the filtered state estimates, which naturally experience a rough start:fig = plt.
figure(figsize = (12, 8))plt.
filtered_state[0, :], color='green', label='fitness')plt.
filtered_state[1, :], color='red', label='fatigue')plt.
title("Filtered estimates of state vector (Fitness and " + "Fatigue) over time")plt.
ylabel('Filtered state estimate')plt.
show()Given that the true fitness state is initialized at 0 and gradually travels higher, the first few filtered state estimates are off by a lot and explain why likelihood_burn was set to 15.
Replacing filtered_state with smoothed_state in the graphing code shows a picture that is much more similar to those in Modeling Cumulative Impact Part I.
DiscussionBefore writing this article, I thought the only way to get a Kalman Filter with control inputs was to pay for MATLAB or SAS/ETS .
And while statsmodels could make the specification a little more straightforward, adding in a control input to its Kalman Filter routines is still easily accomplished using the time-varying state intercept.
Subclass derivation for defining a model, like that used in statsmodels, is sleek but makes debugging less transparent.
(This article does not highlight the hours where these examples were not working.
) Remember that the resulting objects contain their model matrices; printing them out is a good way to debug!.Don’t forget about the “selection matrix” which multiplies the state error in statsmodels — it defaults to zero.
We didn’t need it here because the fitness-fatigue model doesn’t specify state error, but it can lead to the frustrating debugging scenario where code changes aren’t affecting the output.
Though the lack of a state error term made the current task simpler due to statsmodels defaults, the ability to add a state error term is an advantage of the Kalman Filter over the original fitness-fatigue model.
It’s easy to imagine things besides training that affect the latent fitness and fatigue, including sleep quality and nutrition.
Adding a state error term to the original fitness-fatigue model is another variation to be explored.
Unlike in The Kalman Filter and Maximum Likelihood, the “approximate diffuse” initialization led to different results than a known initialization of the state.
In that article, perhaps the stationary ARMA(1, 2) model was too much of a softball, and that uncertainty about the initial state would really only matter in a non-stationary situation without the mean to fall back on.
Nevertheless, it makes me curious about how different the “exact diffuse” initialization would have performed had it been implemented in statsmodels.
The Kalman Filter is a very powerful tool for time series analysis and modeling.
Not only is it able to calculate difficult likelihoods of classical time series models, but it can handle non-stationary models with exogenous control inputs and even guide the next space shuttle to the moon.
As data scientists, we’re lucky to have a free open-source implementation of the Kalman Filter, one that is sufficiently flexible for both statistical and engineering purposes, in statsmodels.
.. More details