ModelBased Action Exploration
This topic contains 0 replies, has 1 voice, and was last updated by arXiv 1 year, 1 month ago.

ModelBased Action Exploration
Deep reinforcement learning has great stride in solving challenging motion control tasks. Recently there has been a significant amount of work on methods to exploit the data gathered during training, but less work is done on good methods for generating data to learn from. For continuous actions domains, the typical method for generating exploratory actions is by sampling from a Gaussian distribution centred around the mean of a policy. Although these methods can find an optimal policy, in practise, they do not scale well, and solving environments with many actions dimensions becomes impractical. We consider learning a forward dynamics model to predict the result, ($s_{t+1}$), of taking a particular action, ($a$), given a specific observation of the state, ($s_{t}$). With a model such as this we, can perform what comes more naturally to biological systems that have already collect experience, we perform internal predictions of outcomes and endeavour to try actions we believe have a reasonable chance of success. This method greatly reduces the space of exploratory actions, increasing learning speed and enables higher quality solutions to difficult problems, such as robotic locomotion.
ModelBased Action Exploration
by Glen Berseth, Michiel van de Panne
https://arxiv.org/pdf/1801.03954v1.pdf
You must be logged in to reply to this topic.