Difference Rewards Policy Gradients

Policy gradient methods have become one of the most popular classes of algorithms for multi-agent reinforcement learning. A key challenge, however, that is not addressed by many of these methods is multi-agent credit assignment: assessing an agent’s contribution to the overall performance, which is crucial for learning good policies. We propose a novel algorithm called Dr.REINFORCE that explicitly tackles this by combining difference rewards with policy gradients to allow for learning decentralized policies when the reward function is known. By differencing the reward function directly, Dr.REINFORCE avoids difficulties associated with learning the Q-function as done by Counterfactual Multiagent Policy Gradients (COMA), a state-of-the-art difference rewards method. For applications where the reward function is unknown, we show the effectiveness of a version of Dr.REINFORCE that learns an additional reward network that is used to estimate the difference rewards.

Reinforcement learning in Minecraft: Challenges and opportunities in multiplayer games

Games have a long history as test beds in pushing AI research forward. From early works on chess and Go to more recent advances on modern video games, researchers have used games as complex decision-making benchmarks. Learning in multi-agent settings is one of the fundamental problems in AI research, posing unique challenges for agents that learn independently, such as coordinating with other learning agents or adapting rapidly online to agents they haven’t previously learned with. In this webinar, join Microsoft researcher Sam Devlin and Queen Mary University of London researchers Martin Balla, Raluca D. Gaina, and Diego Perez-Liebana to learn how the latest AI techniques can be applied to multiplayer games in the challenging and diverse 3D environment of Minecraft. The researchers will demonstrate how Project…