In this case the environmental state becomes the “hidden state” that changes between time steps.
To demonstrate this technique we looked at the pendulum environment, where the task is to swing a pendulum until it stands upright, keeping it balanced with minimal effort.
This is hard for RL models; after around 20 episodes of training the problem is solved, but often the route to a solution is visibly sub-optimal.
In contrast, BPTT can beat the RL leaderboard in a single episode of training.
It’s instructive to actually watch this episode unfold; at the beginning of the recording the strategy is random, and the model improves over time.
The pace of learning is almost alarming.
Despite only experiencing a single episode, the model generalises well to handle any initial angle, and has something pretty close to the optimal strategy.
When restarted the model looks more like this.
This is just the beginning; we’ll get the real wins applying DP to environments that are too hard for RL to work with at all.
The Map Is Not The TerritoryThe limitation of these toy models is that they equate the simulated training environment with the test environment; of course, the real world is not differentiable.
In a more realistic model, the simulation gives us a coarse outline of behaviour that is refined with data.
That data informs (say) the simulated effect of wind, in turn improving the quality of gradients the simulator passes to the controller.
Models can even form part of a controller’s forward pass, enabling it to refine its predictions without having to learn system dynamics from scratch.
Exploring these new architectures will make for exciting future work.
CodaThe core idea is that differentiable programming, where we simply write an arbitrary numerical program and optimise it via gradients, is a powerful way to come up with better deep learning-like models and architectures — especially when we have a large library of differentiable code to hand.
The toy models described are really only previews, but we hope they give an intuition for how these ideas can be applied in more realistic ways.
Just as functional programming involves reasoning about and expressing algorithms using functional patterns, differentiable programming involves expressing algorithms using differentiable patterns.
Many such design patterns have already been developed by the deep learning community, such as for handling control problems or sequence and ttree-structureddata.
This post has introduced a couple of new ones, and as the field matures many more will be invented.
The resulting programs are likely to make even the most advanced current deep learning architectures look crude by comparison.
Check out the github accounts for all authors:Mike Innes — https://github.
com/MikeInnesNeethu Maria Joy — https://github.
com/RoboneetTejan Karmali — https://github.
com/tejank10Originally posted here.
Read more data science articles on OpenDataScience.
com, including tutorials and guides from beginner to advanced levels!.Subscribe to our weekly newsletter here and receive the latest news every Thursday.
.. More details