Postponed Updates for Temporal-Difference Reinforcement Learning

  • Harm van Seijen ,
  • Shimon Whiteson

Ninth International Conference on Intelligent Systems Design and Applications, ISDA 2009, Pisa, Italy |

Publication

This paper presents postponed updates, a new strategy for TD methods,that can improve sample efficiency with- out incurring the computational and space requirements of model-based RL. By recording the agent’s last-visit experi- ence, the agent can delay its update until the given state is revisited, thereby improving the quality of the update. Experimental results demonstrate,that postponed,updates outperforms several competitors, most notably eligibility traces, a traditional way to improve the sample efficiency of TD methods. It achieves this without the need to tune an extra parameter as is needed for eligibility traces.