Here is another figure from the paper to illustrate this:Pushing forward a base distribution to a trajectory distribution.If the simulator is differentiable and invertible, then we can use the change-of-variables formula to derive,With some kind-of simple substitution we can rewrite the symmetric loss from earlier as,The next step is choosing an invertible and differentiable simulator; luckily, the trajectory forecasting task lends itself to some pretty glaring ones from stochastic dynamical systems..The writers chose a stochastic one-step policy,This gives us an iterative way of generating the next point along a trajectory using the past motion profile and context..As long as the stochastic term is invertible, and both terms are differentiable, a simulator using this policy will also be invertible and differentiable.For example, if the noise distribution is the standard normal, then the trajectory distribution is also a normal,Now, using a one-step policy like the one proposed makes computing values in our symmetric loss easy..Namely,Trivial as a Steph Curry 3-pointer right?.(I was told to include a sports reference to make myself appear cool.)Well, for those of you who disagree (myself included because I slept through a lot of my linear algebra classes), I’ll show some of the worked-out math..It hinges on the observation that trajectory points don’t depend on points for later timesteps, which makes the Jacobian lower triangular,A special property of such matrices, which you can easily verify by row/column-expansion, is that their determinants are just the product of the diagonal entries..This gives the result,Finally, we have to revisit this problem I promised I’d go over about 5 minutes ago — approximating the underlying data distribution since we cannot evaluate the underlying’s PDF directly..It’s important that this fixed approximation is decent, because it could add unnecessarily high penalty if it severely underestimates the training distribution in some region.One such method is assuming that the trajectory distribution factors over timesteps, i.e..parameterizing,Then, for each timestep, we discretize the region of possible points into L possible locations..We essentially model the probability distribution for each timestep as some discrete grid map, which can be trained via logistic regression with L classes,The paper shows some examples of learned spatial cost functions..As you can see, it typically gives low-ish cost to all drivable surfaces, and high penalty to obstacles..There’s definitely some questionable parts of the learned distributions, but at least its support (of low-cost regions) covers all the plausible trajectories.Learned prior (white: high cost, black: low cost).. More details