Combining No-regret and Q-learning (2018)

Ian Kash; Katja Hofmann

Combining No-regret and Q-learning (2018)

Ian Kash ,
Katja Hofmann

The 14th European Workshop on Reinforcement Learning (EWRL 2018) | October 2018

Also presented at the AAAI-19 workshop on Reinforcement Learning in Games.

Related File

Download BibTex

Most reinforcement learning algorithms do not provide guarantees in settings with multiple agents or partial observability. A notable exception is Counterfactual Regret Minimization (CFR), which provides both strong convergence guarantees and empirical results in settings like poker. We seek to understand how these guarantees could be achieved more broadly. To take a ﬁrst step in this direction, we introduce a simple algorithm, local no-regret learning (LONR), which captures the spirit of CFR, but can be applied in settings without a terminal state. We prove its convergence for the basic case of MDPs and discuss research directions to extend our results to address richer settings with multiple agents, partial observability, and sampling