Machine Learning

Cold-Start Reinforcement Learning with Softmax Policy Gradients

This topic contains 0 replies, has 1 voice, and was last updated by  arXiv 1 year, 9 months ago.


  • arXiv
    5 pts

    Cold-Start Reinforcement Learning with Softmax Policy Gradients

    Policy-gradient approaches to reinforcement learning have two common and undesirable overhead procedures, namely warm-start training and sample variance reduction. In this paper, we describe a reinforcement learning method based on a softmax policy that requires neither of these procedures. Our method combines the advantages of policy-gradient methods with the efficiency and simplicity of maximum-likelihood approaches. We apply this new cold-start reinforcement learning method in training sequence generation models for structured output prediction problems. Empirical evidence validates this method on automatic summarization and image captioning tasks.

    Cold-Start Reinforcement Learning with Softmax Policy Gradients
    by Nan Ding, Radu Soricut
    https://arxiv.org/pdf/1709.09346v1.pdf

You must be logged in to reply to this topic.