Combining No-regret and Q-learning (2018)

The 14th European Workshop on Reinforcement Learning (EWRL 2018) |

Also presented at the AAAI-19 workshop on Reinforcement Learning in Games.

Related File

Most reinforcement learning algorithms do not provide guarantees in settings with multiple agents or partial observability. A notable exception is Counterfactual Regret Minimization (CFR), which provides both strong convergence guarantees and empirical results in settings like poker. We seek to understand how these guarantees could be achieved more broadly. To take a first step in this direction, we introduce a simple algorithm, local no-regret learning (LONR), which captures the spirit of CFR, but can be applied in settings without a terminal state. We prove its convergence for the basic case of MDPs and discuss research directions to extend our results to address richer settings with multiple agents, partial observability, and sampling