Learning Semantic Correspondences from Noisy Data-text Pairs by Local-to-Global Alignments

The 28th International Conference on Computational Linguistics |

Organized by International Committee for Computational Linguistics

Publication

Learning semantic correspondence between the structured data (e.g., slot-value pairs) and associated texts is a core problem for many downstream NLP applications, e.g., data-to-text generation. Recent neural generation methods require to use large scale training data. However, the collected data-text pairs for training are usually loosely corresponded, where texts contain additional or contradicted information compare to its paired input. In this paper, we propose a local-to-global alignment (L2GA) framework to learn semantic correspondences from loosely related data-text pairs. First, a local alignment model based on multi-instance learning is applied to build the semantic correspondences within a data-text pair. Then, a global alignment model built on top of a memory guided conditional random field (CRF) layer is designed to exploit dependencies among alignments in the entire training corpus, where the memory is used to integrate the alignment clues provided by the local alignment model. Therefore, it is capable of inducing missing alignments for text spans that are not supported by its imperfect paired input. Experiments on recent restaurant dataset show that our proposed method can improve the alignment accuracy and as a by product, our method is also applicable to induce semantically equivalent training data-text pairs for neural generation models.