#### Consequentialist conditional cooperation in social dilemmas with imperfect information

Social dilemmas, where mutual cooperation can lead to high payoffs but participants face incentives to cheat, are ubiquitous in multi-agent interaction. We wish to construct agents that cooperate with pure cooperators, avoid exploitation by pure defectors, and incentivize cooperation from the rest.…

#### Map-based Multi-Policy Reinforcement Learning: Enhancing Adaptability of Robots by Deep Reinforcement Learning

In order for robots to perform mission-critical tasks, it is essential that they are able to quickly adapt to changes in their environment as well as to injuries and or other bodily changes. Deep reinforcement learning has been shown to be successful in training robot control policies for operation…

#### The Effects of Memory Replay in Reinforcement Learning

Experience replay is a key technique behind many recent advances in deep reinforcement learning. Allowing the agent to learn from earlier memories can speed up learning and break undesirable temporal correlations. Despite its wide-spread application, very little is understood about the properties o…

#### Asymmetric Actor Critic for Image-Based Robot Learning

Deep reinforcement learning (RL) has proven a powerful technique in many sequential decision making domains. However, Robotics poses many challenges for RL, most notably training on a physical system can be expensive and dangerous, which has sparked significant interest in learning control policies…

#### Reverse Curriculum Generation for Reinforcement Learning

Many relevant tasks require an agent to reach a certain state, or to manipulate objects into a desired configuration. For example, we might want a robot to align and assemble a gear onto an axle or insert and turn a key in a lock. These goal-oriented tasks present a considerable challenge for reinf…

#### Sequence Tutor: Conservative Fine-Tuning of Sequence Generation Models with KL-control

This paper proposes a general method for improving the structure and quality of sequences generated by a recurrent neural network (RNN), while maintaining information originally learned from data, as well as sample diversity. An RNN is first pre-trained on data using maximum likelihood estimation (…

#### Flow: Architecture and Benchmarking for Reinforcement Learning in Traffic Control

car gradient Reinforcement Learning

Flow is a new computational framework, built to support a key need triggered by the rapid growth of autonomy in ground traffic: controllers for autonomous vehicles in the presence of complex nonlinear dynamics in traffic. Leveraging recent advances in deep Reinforcement Learning (RL), Flow enables …

#### Manifold Regularization for Kernelized LSTD

gradient Reinforcement Learning

Policy evaluation or value function or Q-function approximation is a key procedure in reinforcement learning (RL). It is a necessary component of policy iteration and can be used for variance reduction in policy gradient methods. Therefore its quality has a significant impact on most RL algorithms.…

#### Emergent Complexity via Multi-Agent Competition

Reinforcement learning algorithms can train agents that solve problems in complex, interesting environments. Normally, the complexity of the trained agent is closely related to the complexity of the environment. This suggests that a highly capable agent requires a complex environment for training. …

#### Learning to Generalize: Meta-Learning for Domain Generalization

Domain shift refers to the well known problem that a model trained in one source domain performs poorly when applied to a target domain with different statistics. {Domain Generalization} (DG) techniques attempt to alleviate this issue by producing models which by design generalize well to novel tes…

#### On- and Off-Policy Monotonic Policy Improvement

gradient Reinforcement Learning

Monotonic policy improvement and off-policy learning are two main desirable properties for reinforcement learning algorithms. In this study, we show that the monotonic policy improvement is guaranteed from on- and off-policy mixture data. Based on the theoretical result, we provide an algorithm whi…

#### Recurrent Network-based Deterministic Policy Gradient for Solving Bipedal Walking Challenge on Rugged Terrains

gradient LSTM Reinforcement Learning RNN

This paper presents the learning algorithm based on the Recurrent Network-based Deterministic Policy Gradient. The Long-Short Term Memory is utilized to enable the Partially Observed Markov Decision Process framework. The novelty are improvements of LSTM networks: update of multi-step temporal diff…

#### Combating Reinforcement Learning’s Sisyphean Curse with Intrinsic Fear

DQN game Reinforcement Learning

To use deep reinforcement learning in the wild, we might hope for an agent that can avoid catastrophic mistakes. Unfortunately, even in simple environments, the popular deep Q-network (DQN) algorithm is doomed by a Sisyphean curse. Owing to the use of function approximation, these agents may eventu…

#### Combating Reinforcement Learning’s Sisyphean Curse with Intrinsic Fear

DQN game Reinforcement Learning

To use deep reinforcement learning in the wild, we might hope for an agent that can avoid catastrophic mistakes. Unfortunately, even in simple environments, the popular deep Q-network (DQN) algorithm is doomed by a Sisyphean curse. Owing to the use of function approximation, these agents may eventu…

#### Combating Reinforcement Learning’s Sisyphean Curse with Intrinsic Fear

DQN game Reinforcement Learning

To use deep reinforcement learning in the wild, we might hope for an agent that can avoid catastrophic mistakes. Unfortunately, even in simple environments, the popular deep Q-network (DQN) algorithm is doomed by a Sisyphean curse. Owing to the use of function approximation, these agents may eventu…

#### Meta Inverse Reinforcement Learning via Maximum Reward Sharing for Human Motion Analysis`

health motion Reinforcement Learning

This work handles the inverse reinforcement learning (IRL) problem where only a small number of demonstrations are available from a demonstrator for each high-dimensional task, insufficient toestimate an accurate reward function. Observing that each demonstrator has an inherent reward for each stat…

#### Rainbow: Combining Improvements in Deep Reinforcement Learning

DQN game Reinforcement Learning

The deep reinforcement learning community has made several independent improvements to the DQN algorithm. However, it is unclear which of these extensions are complementary and can be fruitfully combined. This paper examines six extensions to the DQN algorithm and empirically studies their combinat…

#### Multi-Level Discovery of Deep Options

DQN game Reinforcement Learning

Augmenting an agent’s control with useful higher-level behaviors called options can greatly reduce the sample complexity of reinforcement learning, but manually designing options is infeasible in high-dimensional and abstract state spaces. While recent work has proposed several techniques for…

#### Parameter Sharing Deep Deterministic Policy Gradient for Cooperative Multi-agent Reinforcement Learning

gradient Reinforcement Learning

Deep reinforcement learning for multi-agent cooperation and competition has been a hot topic recently. This paper focuses on cooperative multi-agent problem based on actor-critic methods under local observations settings. Multi agent deep deterministic policy gradient obtained state of art results …

#### Detecting Adversarial Attacks on Neural Network Policies with Visual Foresight

Deep reinforcement learning has shown promising results in learning control policies for complex sequential decision-making tasks. However, these neural network-based policies are known to be vulnerable to adversarial examples. This vulnerability poses a potentially serious threat to safety-critica…

#### Deep Abstract Q-Networks

DQN game Reinforcement Learning

We examine the problem of learning and planning on high-dimensional domains with long horizons and sparse rewards. Recent approaches have shown great successes in many Atari 2600 domains. However, domains with long horizons and sparse rewards, such as Montezuma’s Revenge and Venture, remain c…

#### Optimal Distributed Control of Multi-agent Systems in Contested Environments via Reinforcement Learning

This paper presents a model-free reinforcement learning (RL) based distributed control protocol for leader-follower multi-agent systems. Although RL has been successfully used to learn optimal control protocols for multi-agent systems, the effects of adversarial inputs are ignored. It is shown in t…

#### Vision-based deep execution monitoring

Bayes DNN Reinforcement Learning

Execution monitor of high-level robot actions can be effectively improved by visual monitoring the state of the world in terms of preconditions and postconditions that hold before and after the execution of an action. Furthermore a policy for searching where to look at, either for verifying the rel…

#### Self-supervised Deep Reinforcement Learning with Generalized Computation Graphs for Robot Navigation

Enabling robots to autonomously navigate complex environments is essential for real-world deployment. Prior methods approach this problem by having the robot maintain an internal map of the world, and then use a localization and planning method to navigate through the internal map. However, these a…

#### The BURCHAK corpus: a Challenge Data Set for Interactive Learning of Visually Grounded Word Meanings

language Reinforcement Learning

We motivate and describe a new freely available human-human dialogue dataset for interactive learning of visually grounded word meanings through ostensive definition by a tutor to a learner. The data has been collected using a novel, character-by-character variant of the DiET chat tool (Healey et a…