Machine Learning

Combating Reinforcement Learning’s Sisyphean Curse with Intrinsic Fear

This topic contains 0 replies, has 1 voice, and was last updated by  arXiv 1 year, 2 months ago.


  • arXiv
    5 pts

    Combating Reinforcement Learning’s Sisyphean Curse with Intrinsic Fear

    To use deep reinforcement learning in the wild, we might hope for an agent that can avoid catastrophic mistakes. Unfortunately, even in simple environments, the popular deep Q-network (DQN) algorithm is doomed by a Sisyphean curse. Owing to the use of function approximation, these agents may eventually forget experiences as they become exceedingly unlikely under a new policy. Consequently, for as long as they continue to train, DQNs may periodically repeat avoidable catastrophic mistakes. In this paper, we learn a emph{reward shaping} that accelerates learning and guards oscillating policies against repeated catastrophes. First, we demonstrate unacceptable performance of DQNs on two toy problems. We then introduce emph{intrinsic fear}, a new method that mitigates these problems by avoiding dangerous states. Our approach incorporates a second model trained via supervised learning to predict the probability of catastrophe within a short number of steps. This score then acts to penalize the Q-learning objective. Equipped with intrinsic fear, our DQNs solve the toy environments and improve on the Atari games Seaquest, Asteroids, and Freeway.

    Combating Reinforcement Learning’s Sisyphean Curse with Intrinsic Fear
    by Zachary C. Lipton, Abhishek Kumar, Lihong Li, Jianfeng Gao, Li Deng
    https://arxiv.org/pdf/1611.01211v7.pdf

You must be logged in to reply to this topic.