![](https://www.microsoft.com/en-us/research/uploads/prod/2019/11/MSResearch_20191113_NeurIPS_BetterExpolorationWithOptimisticActorCritic_1400x788.png)
Optimistic Actor Critic avoids the pitfalls of greedy exploration in reinforcement learning
One of the core directions of Project Malmo is to develop AI capable of rich interactions. Whether that means learning new skills to apply to challenging problems, understanding complex environments, or knowing when to enlist the help of humans, reinforcement learning (RL) is a core…