OAG: Toward Linking Large-scale Heterogeneous Entity Graphs

  • Fanjin Zhang ,
  • Xiao Liu ,
  • Jie Tang ,
  • Yuxiao Dong ,
  • Peiran Yao ,
  • Jie Zhang ,
  • Xiaotao Gu ,
  • Yan Wang ,
  • Bin Shao ,
  • Rui Li ,
  • Kuansan Wang

ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD) |

Linking entities from different sources is a fundamental task in building open knowledge graphs. Despite much research conducted in related fields, the challenges of linking large-scale heterogeneous entity graphs are far from resolved. Employing two billion-scale academic entity graphs (Microsoft Academic Graph and AMiner) as sources for our study, we propose a unified framework—LinKG—to address the problem of building a large-scale linked entity graph. LinKG is coupled with three linking modules, each of which addresses one category of entities. To link word-sequence-based entities (e.g., venues), we present a long short-term memory network based method for capturing the dependencies. To link large-scale entities (e.g., papers), we leverage locality-sensitive hashing and convolutional neural networks for scalable and precise linking. To link entities with ambiguity (e.g., authors), we propose heterogeneous graph attention networks to model different types of entities. Our extensive experiments and systematical analysis demonstrate that LinKG can achieve linking accuracy with an F1-score of 0.9510, significantly outperforming the state-of-the-art. LinKG has been deployed to Microsoft Academic Search and AMiner to integrate the two large graphs. We have published the linked results—the Open Academic Graph (OAG), making it the largest publicly available heterogeneous academic graph to date.