Improving Event Extraction via Multimodal Integration
- Tongtao Zhang ,
- Spencer Whitehead ,
- Hanwang Zhang ,
- Hongzhi Li ,
- Joseph Ellis ,
- Lifu Huang ,
- Wei Liu ,
- Heng Ji ,
- Shih-Fu Chang
Proceedings of the 25th ACM international conference on Multimedia |
Published by ACM
In this paper, we focus on improving Event Extraction (EE) by
incorporating visual knowledge with words and phrases from text
documents. We rst discover visual paerns from large-scale textimage
pairs in a weakly-supervised manner and then propose a
multimodal event extraction algorithm where the event extractor is
jointly trained with textual features and visual paerns. Extensive
experimental results on benchmark data sets demonstrate that the
proposed multimodal EE method can achieve signicantly beer
performance on event extraction: absolute 7.1% F-score gain on
event trigger labeling and 8.5% F-score gain on event argument
labeling.