This is an interesting – paraphrasing from the article – “distillation of reinforcement learning research from the past two decades,” approaching from a cognitive science perspective. It considers the paradigm of training a large (many parameter) model to build a representation of the environment and its future, then training a smaller controller model, which outputs actions, from that representation. The future model, they argue, is similar to the way that humans make decisions – by considering how their actions will affect the future.
The RNN part of the model uses an approach similar to the one that SketchRNN does to predict future actions. It sends compressed version of the environment through a feed-forward RNN, taking, at each stage t, the hidden vector h{t-1}, the action a{t-1}, and an additional input z_{t-1}. z_t is a vector sampled from the mixture of Gaussian distributions output by a Mixture Density Network.
These actions are generated by the final layer of the model, the controller, which takes h_t and z_t as input.
They conduct a variety of experiments using various combinations of the three layers, showing, eg, that the full model is better than just using the RNN as input to the controller.
What’s interesting is that they can use the VAE to decode the latent vectors in the MDN-RNN model, giving a representation of the model’s prediction of the future environment. After training the model only to learn this virtual representation, they can then train the M and C layers on it and apply it to the real environment.