GLOW: Global Weighted Self-Attention Network for Web Search

Xuan Shan; Chuanjie Liu; Yiqian Xia; Qi Chen; Yusi Zhang; Kaize Ding; Yaobo Liang; Angen Luo; Yuxiang Luo

2021 INTERNATIONAL CONFERENCE ON BIG DATA | November 2021

Download BibTex

Deep matching models aim to facilitate search engines retrieving
more relevant documents by mapping queries and documents into
semantic vectors in the first-stage retrieval. When leveraging BERT
as the deep matching model, the attention score across two words
are solely built upon local contextualized word embeddings. It lacks
prior global knowledge to distinguish the importance of different
words, which has been proved to play a critical role in information
retrieval tasks. In addition to this, BERT only performs attention
across sub-words tokens which weakens whole word attention representation. We propose a novel Global Weighted Self-Attention
(GLOW) network for web document search. GLOW fuses global
corpus statistics into the deep matching model. By adding prior
weights into attention generation from global information, like
BM25, GLOW successfully learns weighted attention scores jointly
with query matrix Q and key matrix K. We also present an efficient
whole word weight sharing solution to bring prior whole word
knowledge into sub-words level attention. It aids Transformer to
learn whole word level attention. To make our models applicable
to complicated web search scenarios, we introduce combined fields
representation to accommodate documents with multiple fields
even with variable number of instances. We demonstrate GLOW is
more efficient to capture the topical and semantic representation
both in queries and documents. Intrinsic evaluation and experiments conducted on public data sets reveal GLOW to be a general
framework for document retrieve task. It significantly outperforms
BERT and other competitive baselines by a large margin while retaining the same model complexity with BERT. The source code is
available at https://github.com/GLOW-deep/GLOW.