In reinforcement learning, all objective functions are not equal

  • Romain Laroche ,
  • Harm van Seijen

Proceedings of the 6th International Conference on Learning Representations (ICLR) - workshop track |

We study the learnability of value functions. We get the reward back propagation out of the way by fitting directly a deep neural network on the analytically computed optimal value function, given a chosen objective function. We show that some objective functions are easier to train than others by several magnitude orders. We observe in particular the influence of the parameter and the decomposition of the task into subtasks.