Lagged Variable Regressions and TruthDynamic regression models offer vast representative power but also bias riskBen OgorekBlockedUnblockFollowFollowingJun 28Variables related to each other over adjacent time steps, originally in the context of dynamic Bayesian networks (Wikimedia user Guillaume.

lozenguez, CC BY-SA 4.

0)Turn a nonlinear structural time-series model into a regression on lagged variables using rational transfer functions and common filters,See bias in an ordinary least squares lagged variable regression due to remaining serial correlation in the errors,Use generalized least squares to eliminate the bias and recover the process parameters, i.

e.

, the “truth.

”“Near things are more related than distant things,” states Tobler’s (very believable) first law of geography.

However little we know about an underlying data generating mechanism, we can often exploit sheer nearness — in time or space — to gain an advantage.

But we also know that regressing on a time-shifted copy of the dependent variable, one that is affected by all the same factors, comes with caveats.

The biggest caveat is the potential for asymptotic bias in ordinary least squares (OLS) regression that no amount of additional data can alleviate.

This happens when regressors are contemporaneously correlated with the error term.

As explained by Peter Kennedy in A Guide to Econometrics, “This is because the OLS procedure, in assigning ‘credit’ to regressors for explaining variation in the dependent variable, assigns, in error, some of the disturbance-generated variation of the dependent variable to the regressor with which that disturbance is contemporaneously correlated.

” In short, OLS can’t separate the effects of regressor and error when the two are correlated.

From a passive prediction standpoint, the bias might not matter, and “truth,” in this article’s title, refers to the true data generating process underlying a time series.

Henceforth, our goal is quality estimation of the process parameters.

For both an objective truth and a case where parameters have scientific meaning, I’ll turn to a nonlinear, nonstationary, structural time-series model, the “fitness-fatigue” model of athletic performance.

This model relates training history to athletic performance via latent fitness and fatigue constructs and is tuned by scientifically meaningful parameters.

By rewriting this model using rational transfer functions of input to output, a set of “common filtering” operations will become apparent.

Applying these filters will transform the model into a dynamic linear regression, but the filtering operations will also have a side effect: errors with an MA(2) correlation structure.

We will see the destructive power of contemporaneous correlation between regressors and error in OLS regression, but in the end, we will get to the truth via a GLS regression that accommodates that structure.

A nonlinear structural time-series modelThe “fitness-fatigue” model of athletic performance has the form:where p_t is a numeric measure of (athletic) performance, w_t is training “dose,” and ϵ_t is i.

i.

d.

Gaussian error.

The model can be viewed as a linear regression with two nonlinear features,latent representations of athletic “fitness” and “fatigue,” respectively.

These features are convolutions of athletic training history with exponential decay, differing only in the decay rate.

Typically, fatigue (g_t) is relatively transient and associated with a faster exponential decay, but also has a relatively large regression weight compared to fitness.

With a bit of algebra, fitness and fatigue can be put into the dynamic form:where θ_1 = exp(-1 / τ_1) and θ_2 = exp(-1 / τ_2).

If we plug the dynamic representation of fitness and fatigue into the original model, we arrive atwhich looks nicer but is not immediately useful as h_t and g_t depend on unknown parameters.

Rational transfer functions to the rescueWorking a little harder, we can use the “backshift operator” B defined by B y_t = y_{t-1} for any time-indexed variable y, and arrive atSolving for h_t and g_t and plugging them back into the original model, we arrive at:Thus we have two rational transfer functions operating on the exogenous input series w_t.

With rational transfer functions, denominator terms of the form (1- θ B) correspond to an autoregressive impulse response, i.

e.

, a process with a long memory, and this is a nuisance to us.

There is an option to rid ourselves of the denominator component, but not one without a cost.

A “common filter,” as discussed in Identification of Multiple-Input Transfer Function Models (Liu & Hanssens, 1982) premultiplies the right and left-hand side of a time-series equation by (1- θB).

It does not change the transfer function weights, so you can apply multiple common filters in succession, and besides causing complexity and losing some rows due to lags, you have not destroyed the relationship between the input and output series.

The point is, if we were to use the common filter (1- θ_1 B) (1 -θ_2 B), we rid ourselves of the autoregressive components:The above still looks ugly, but after expanding the polynomials applying the backshift operations, we arrive at:This is a regression of performance on two lagged versions of itself and on the external training input, but one with a twist.

There is the issue of the MA(2) error structure induced by the common filters, and it has real consequences.

A simulated example in RThis section will use simulated data that can be reproduced from an R gist or downloaded directly as a csv file.

To run the R code below, change the file path in the following R code block:train_df <- read.

csv("c:/devl/data/train_df.

csv")head(train_df)The following calculations show the theoretical values of all the coefficients in the lagged variable model as simulated:# True parameter values:mu <- 496theta1 <- exp(-1 / 60)theta2 <- exp(-1 / 13)k1 <- .

07k2 <- .

27#Theoretical coefficient for intercept is(1 – theta1) * (1 – theta2) * mu#Theoretical coefficient for performance lagged once istheta1 + theta2#Theoretical coefficient for performance lagged twice is-theta1 * theta2#Theoretical coefficient for training lagged once isk1 * theta1 – k2 * theta2#Theoretical coefficient for training lagged twice is-theta1 * theta2 * (k1 – k2)And thus the goal is to recover the following equation:Side note: To go the opposite direction, solve for θ_1 and θ_2 by substitution, which leads to a quadratic equation in one of the two parameters.

They lead to the same two roots, so choose θ_1 to be the larger of the two.

With θ_1 and θ_2 in hand, solving for μ is trivial and k_1 and k_2 constitute a linear system with two equations and two unknowns.

Fitting the lagged regression modelTo prepare for the regression, the code below lags performance and training variables once and twice each.

The dplyr package makes it easy:library(dplyr)train_aug <- train_df %>% mutate(perf_lag1 = lag(perf, n = 1, order_by = day), perf_lag2 = lag(perf, n = 2, order_by = day), train_lag1 = lag(w, n = 1, order_by = day), train_lag2 = lag(w, n = 2, order_by = day))Let’s see what happens when we regress performance on these lagged variables using OLS:my_lm <- lm(perf ~ perf_lag1 + perf_lag2 + train_lag1 + train_lag2, data = train_aug[3:nrow(train_aug), ])summary(my_lm)The above code results in the partial output:Coefficients: Estimate Std.

Error t value Pr(>|t|) (Intercept) 33.

99821 12.

09470 2.

811 0.

005327 ** perf_lag1 0.

48066 0.

05619 8.

553 1.

18e-15 ***perf_lag2 0.

46189 0.

05504 8.

393 3.

45e-15 ***train_lag1 -0.

15602 0.

04406 -3.

541 0.

000475 ***train_lag2 -0.

02346 0.

04516 -0.

520 0.

603807 —Signif.

codes: 0 ‘***’ 0.

001 ‘**’ 0.

01 ‘*’ 0.

05 ‘.

’ 0.

1 ‘ ’ 1Residual standard error: 8.

518 on 252 degrees of freedomMultiple R-squared: 0.

9117, Adjusted R-squared: 0.

9103 F-statistic: 650.

4 on 4 and 252 DF, p-value: < 2.

2e-16There is clearly bias in the OLS coefficient estimates as a results of the serial correlation remaining in the residuals.

Fortunately, R’s nlme package offers generalized least squares (GLS) model fitting via the gls function, which handles the MA(2) error structure:library(nlme)my_gls <- gls(perf ~ perf_lag1 + perf_lag2 + train_lag1 + train_lag2, data = train_aug[3:nrow(train_aug), ], corARMA(form = ~day, p = 0, q = 2))summary(my_gls)The following partial output comes from the above code:Correlation Structure: ARMA(0,2) Formula: ~day Parameter estimate(s): Theta1 Theta2 -1.

9059497 0.

9117409 Coefficients: Value Std.

Error t-value p-value(Intercept) 0.

6571088 0.

11700730 5.

61596 0perf_lag1 1.

9187158 0.

00815689 235.

22646 0perf_lag2 -0.

9200058 0.

00815495 -112.

81568 0train_lag1 -0.

1662026 0.

02238219 -7.

42566 0train_lag2 0.

1664704 0.

02241510 7.

42671 0Via GLS regression, we’ve recovered the true parameter values to within plausible estimation error.

DiscussionPrevious articles on the fitness-fatigue model have used heavy machinery to estimate the parameters: nonlinear least squares, linear regression with complex spline-convolution features and the Kalman Filter, to name three.

That a linear regression (fit via GLS) on a few lagged variables was able to recover the theoretical underlying process speaks to the expressive power of the dynamic regression modelWhat is the cost of blindly regressing on a few lags of input and output each?.Focusing only on one-step ahead forecast accuracy, probably not much.

The OLS regression with lagged variables “explained” most of the variation in the next performance value, but it’s also suggesting a quite different process than the one used to simulate the data.

The internals of this process were recovered by the GLS regression, and this speaks of getting to the “truth” that the title mentioned.

A topic for further research and discussion would be whether it is wise to use GLS regression with an MA(q) error structure for all dynamic regressions.

It is easy enough to implement in R via nlme.

On the other hand, its prediction function did not use the error structure, and it would thus require additional effort from the user to make the GLS forecasts competitive with those of OLS.

For out-of-the-box, one-step-ahead prediction, may be difficult to justify the added complexity of the GLS procedure.

For getting to the true underlying system parameters, incorporating an MA(q) error structure via GLS may very well bring us closer to the truth.

.