Hierarchically Structured Reinforcement Learning for Topically Coherent Visual Story Generation

  • Qiuyuan Huang ,
  • Zhe Gan ,
  • Asli Celikyilmaz ,
  • Oliver Wu ,
  • Jianfeng Wang ,
  • Xiaodong He

AAAI 2019 |

We propose a hierarchically structured reinforcement learning approach to address the challenges of planning for generating coherent multi-sentence stories for the visual storytelling task. Within our framework, the task of generating a story given a sequence of images is divided across a two-level hierarchical decoder. The high-level decoder constructs a plan by generating a semantic concept (i.e., topic) for each image in sequence. The low-level decoder generates a sentence for each image using a semantic compositional network, which effectively grounds the sentence generation conditioned on the topic. The two decoders are jointly trained end-to-end using reinforcement learning. We evaluate our model on the visual storytelling (VIST) dataset. Empirical results demonstrate that the proposed hierarchically structured reinforced training achieves significantly better performance compared to a flat deep reinforcement learning baseline.