To prepare for this question, Part II of Modeling Cumulative Impact develops a flexible spline-based model of training and performance.
To test it out, we’ll use it to recover the theoretical model used to simulate H.
’s data, and in the next part we’ll try it on some real athletes.
Parametric B-splines are represented as η(t) = θ_1 + Σ θ_j g_j(t) for basis functions g_j(t), t = 2, …, p.
Using the ns() function from R’s splines package, the basis functions are completely determined from the choice of knot placement.
Given a reasonable set of knots, if we can estimate θ_1, …θ_j from data, we’ll have a new model with a flexible impulse response.
Since convolution is distributive over addition, substituting the fitness and fatigue constructs with η(t) just works, and at the end we are left with p convolution-based variables instead of two in a performance regression.
See the following embed for the algebra:The result of the math above is that, instead of convolving training intensities with exponential decay, we’re convolving them with spline basis functions and a constant offset.
These new convolution variables are:and these enter the performance regression formula just like fitness and fatigue in Part I.
To procure our basis functions g_j, we’ll have to face the knot-selection problem associated with parametric splines.
Knots are the “breakpoints” where two approximating cubic polynomials meet and they should be chosen to capture important non-linear behavior.
But how do you know the important non-linear behavior without knowing the to-be-estimated function in the first place?To start this chicken-and-egg problem with chicken, we’ll use a nearly ideal knot placement for our simulated data.
In the fitness-fatigue model, the combined impulse response is a linear combination of the fitness and fatigue exponentials:Convolving this single function with training intensity is the same as convolving the two separate exponentials, and our eventual goal will be to estimate ϕ(t) with our spline approach from data.
This function reflects the true impulse response for a training event considering both fitness and fatigue simultaneously.
Theoretical system response (solid line) to one of H.
’s training events.
With three carefully chosen knots, our B-spline (dashed) is capable of approximating the function with minimal bias.
Examining the graph of ϕ(t) (Figure 1) in relation to H.
’s performance, we see that in the initial days following a training event, there is a net negative effect on H.
By around day 25 after the training event, the net negative impact has been neutralized.
At around day 40 after the event, with fatigue having dissipated at fitness still strong, H.
is enjoying maximum benefit from the original training session.
After day 40, the slow exponential decline of fitness gradually brings the benefit of the one workout to zero impact after around 200 days.
In this article we do not see ϕ(t) directly but will estimate it in a flexible manner via our B-spline η(t), which itself depends on estimated parameters.
A B-spline with only three interior knots about nails the approximation, if you put the knots in the right places, say at 14 days, 40 days, and 100 days, but even then there will be bias.
Figure 1 also shows η*(t), or our spline fit directly to the true function ϕ(t) via linear regression, which is darn close.
The regression code is shown below in Code Block 2:Code Block 2.
Comparing the true theoretical impulse response with B-spline approximation.
Full example with plots shown in the Jupyter notebook.
Aside from visually showing us how close η(t) can come to ϕ(t) with perfect information, the η*(t) regression gives us the best parameter values, θ_1, …, θ_p, for approximating ϕ(t).
If we can get close to these coefficient values using training and performance data, we’re on the right track.
print(eta_star_lm)Call:lm(formula = plot_df$level ~ my_spline)Coefficients:(Intercept) my_spline1 my_spline2 my_spline3 my_spline4 -0.
0756It’s time to estimate ϕ(t) with our B-spline η(t) from actual training and performance data.
In this article, we will bring along one more thing: our perfectly chosen knots.
This is for simplicity of illustration and is “cheating” given our knowledge of the simulated world.
In the next part of this series, we’ll work with performance data of real swimmers where we will not have this luxury.
Recall from Part I that estimating ϕ(t) via the antagonistic fitness and fatigue constructs was a numerical effort requiring either direct nonlinear estimation or a grid search in the time constants followed by OLS regression.
An added benefit to the flexibility afforded by the spline convolution approach is that it’s entirely linear.
The following 20 lines of code depends only on the the simulated data in train_dfand completes the spline-based convolution estimation:Code Block 3: Introducing a spline-based method for modeling cumulative impact.
The regression coefficient estimates match those of the η*(t) regression above, up to estimation error at least.
Notice that the variable z_1 actually corresponds to the intercept of the η*(t) regression, since that variable was created via a convolution of a column of ones with training intensity.
The new intercept is just the performance baseline and is conceptually the same as in the fitness-fatigue model.
> summary(spline_reg)Call:lm(formula = perf ~ z_1 + z_2 + z_3 + z_4 + z_5, data = train_aug_df)Residuals: Min 1Q Median 3Q Max-16.
6338Coefficients: Estimate Std.
Error t value Pr(>|t|)(Intercept) 491.
81 <2e-16 ***z_1 -0.
22 <2e-16 ***z_2 0.
84 <2e-16 ***z_3 0.
55 <2e-16 ***z_4 0.
51 <2e-16 ***z_5 0.
07 <2e-16 ***—Signif.
codes: 0 '***' 0.
001 '**' 0.
01 '*' 0.
1 ' ' 1Residual standard error: 6.
965 on 253 degrees of freedomMultiple R-squared: 0.
9408, Adjusted R-squared: 0.
5 on 5 and 253 DF, p-value: < 2.
2e-16For the purposes of prediction, there’s no need to reconstruct the estimate of ϕ(t) and reconvolve with the data.
The convolution-based features are already in the regression and the predictions are the fitted values stored in train_aug_df$perf_hat.
From a modeling and prediction standpoint, our job of replacing the theory-driven fitness-fatigue model with an empirically driven alternative is done.
Just like in Part I, we have performance predictions based on past training sessions and the recipe to reconstruct the total impulse response function.
To fully close the loop, Code Block 4 does the following things:uses the regression coefficients obtained above to estimate the true impulse response ϕ(t) from data, i.
, “η(t) hat,”convolves training with the fitted impulse response η(t) to demonstrate that, aside from end-effects, the fitted values from the regression are equivalent,convolves training with true impulse response ϕ(t), i.
, the fitness fatigue model in this article, for purposes of comparison.
Code Block 4.
Closing the loop.
Note that the get_eta_hat function discards the first coefficient, the performance baseline, and adds it back in on line 23.
The second coefficient is also an intercept, the constant term in η(t), explaining the column of ones appended to the spline variables.
The computations above lead to the following comparative plots of the spline-based method with the fitness-fatigue model used to simulate the data.
As in Figure 1, the solid line is still the theoretical system response to one of H.
’s training events.
The dashed line also looks similar, but here it is eta_hat, the spline fit from the training and performance data.
Performance predictions based on past training sessions for classical fitness-fatigue and presented spline-based method.
While exponential decay is widely applicable, there are many other interesting decay profiles.
The moving average model in time series, for instance, says that a random shock has influence for a fixed number of time periods before dropping off completely.
Or perhaps the impact of an event decays down to a non-zero level, as we’d like to believe is true of fitness.
The spline-based method shown in this article accommodates these departures from exponential decay.
This article, Part II of Modeling Cumulative Impact, still relied on simulated data and was able to choose knots based on an unrealistic knowledge of the world.
The next part will investigate this method on real athletes where there is missing data and dynamics that happen on a different time scale.
While the method loses the intuitive appeal of fitness and fatigue as model constructs, the closed-form solution and lack of a need for starting values makes it an attractive alternative.
The method makes performance predictions that are very similar to its theory-driven, fully parametric counterpart and delivers an estimate of the impulse response that continues to have an intuitive subject matter interpretation.
In the next article, we “drop the hammer” and hit the water with five real swimmers to test out this method in practice.
It’s coming in Part III of Modeling Cumulative Impact.
Banister, Modeling human performance in running(1985), Journal of Applied Physiology T.
Candau, and J.
Lacour, Fatigue and fitness modelled from the effects of training on performance (1994), European Journal of Applied Physiology.. More details