Event Specific Multimodal Pattern Mining for Knowledge Base Construction
- Hongzhi Li ,
- Joseph G. Ellis ,
- Heng Ji ,
- Shih-Fu Chang
Proceedings of the 2016 ACM on Multimedia Conference |
Published by ACM
Knowledge bases, which consist of a collection of entities, attributes, and the relations between them are widely used and important for many information retrieval tasks. Knowledge base schemas are often constructed manually using experts with specific domain knowledge for the field of interest. Once the knowledge base is generated then many tasks such as automatic content extraction and knowledge base population can be performed, which have so far been robustly studied by the Natural Language Processing community. However, the current approaches ignore visual information that could be used to build or populate these structured ontologies. Preliminary work on visual knowledge base construction only explores limited basic objects and scene relations. In this paper, we propose a novel multimodal pattern mining approach towards constructing a high-level “event” schema semi-automatically, which has the capability to extend text only methods for schema construction. We utilize a large unconstrained corpus of weakly-supervised image-caption pairs related to high-level events such as “attack” and “demonstration” to both discover visual aspects of an event, and name these visual components automatically. We compare our method with several state-of-the-art visual pattern mining approaches and demonstrate that our proposed method can achieve dramatic improvements in terms of the number of concepts discovered (33% gain), semantic consistence of visual patterns (52% gain), and correctness of pattern naming (150% gain).