Telescope peering into the night sky

Microsoft Academic

Feature improvement: Related Publications

Share this page

Some of us remember walking into a library to look for a book or journal article and leaving with an armful of books. Browsing the materials in physical proximity to the one we were looking for was a form of research, as it helped us discover related publications that we may not have come across otherwise. A lot of this serendipity is lost in online search, but we gladly give it up for the many other advantages is offers. With this week’s graph update, we bring the best of both worlds, by enhancing our powerful semantic search with an improved Related Publications feature.

On Microsoft Academic, related publications are papers that are not necessarily cited by or citing a paper but are sufficiently relevant that readers interested in the original paper will likely be interested in these publications as well. You can access related publications from a paper’s detail page. Just click a publication’s title in the list of search results to navigate to its detail page, which currently has three sections: References, Citations, and Related Publications.

While related publications have existed on Microsoft Academic for a while, they are now much improved. First, related paper relevance is now truly semantic. Second, we increased the number of papers that have related publications listed on their detail page. Third, we increased the number of related publications for a paper in our graph.

Semantic relevance of Related Publications

The paper selection appearing in the Related Publications section has improved as we have changed the method for identifying relevant papers and computing their similarity. Previously, we used to identify related publications by only using citation data. For example, if paper A cited paper B, and paper C also cited paper B, then we could infer that readers interested in paper A were likely to also be interested in paper C. The drawbacks to this approach were that not only was it complicated to acquire accurate citation data, but that citation data is inherently biased towards publications within the same research domain – we rarely see papers citing other papers outside of their main research area even if other research areas are working on the same problem.

The new method we are using is very different, and truly semantic. By using a technique known as word embedding (opens in new tab), we transform meaning into mathematics. Word embeddings transform each word into a multi-dimensional vector. If two multi-dimensional vectors are similar, then the words they represent tend to be used in similar contexts and are likely synonyms. So, the vectors for car and automobile will be very similar. To calculate relevance among papers, we go beyond individual words and create an embedding for an entire document by using its title, abstract and keywords. The relevance among papers shown in Related Publications is now computed using a combination of co-citation data and document embeddings. Citation data provides us with a human behavior-based signal for relevance, while document embeddings provide a content-based signal. The results are lists of related publications that are more relevant to the original paper but may not necessarily use the same words or even be from the same research domain although they present similar underlying concepts.

Take, for example this paper (opens in new tab) about computer animation, which discusses animating the behavior of flocks and herds, and has been tagged with fields of study (opens in new tab) such as Computer animation (opens in new tab), Simulation (opens in new tab), and Computer science (opens in new tab).

Screenshot of paper Flocks, herds and schools: A distributed behavioral model showing fields of study tagged onto paper.

As we click on “Related Publications,” we scroll down the page past the list of references and papers citing this paper, to the Related Publications section:

Screenshot of list of Related Papers

As we browse related publications, we see the paper, Collective Memory and Spatial Sorting in Animal Groups (opens in new tab), published in the Journal of Theoretical Biology (opens in new tab) and tagged with fields of study in Biology (opens in new tab), Ecology (opens in new tab) and Self-organization (opens in new tab), which are very different from the fields of study tagged onto our initial paper. This example shows how the improved Related Publications feature can help you discover relevant scholarship outside your own discipline.

Screenshot of paper, Collective Memory and Spatial Sorting in Animal Groups

Increased number of papers with Related Publications

Not all papers have related publications. This is a feature only available for papers in the English language as word embeddings do not work well across languages. Even for English papers there are cases where paper relevance cannot be computed – for example, if the abstract is missing. There is also the case where the embedding generated for a document may not be easily clustered with other documents (an important step in computing relevance), which prevents us from finding related papers. Very recent papers published in the last 6-8 months do not yet have related publications identified with the new method. Overall, however, the number of papers for which we offer related publications has increased in this release by 400%, from about 30 million to about 120 million.

We are experimenting with new technologies that have shown promise to further improve related publications, such as the NetMF algorithm (opens in new tab) we presented at WSDM 2018 (opens in new tab). Once we complete further tests and validation, we hope to bring even more power to related papers, and potentially, other related entities on the site.

Higher number of Related Publications

The new method for calculating publication relevance enables us to generate a higher number of related publications. We cap the number of related publications to 20 on our website to ensure relevance quality and avoid overwhelming users. That said, if users want to see all the related publications for a paper they are encouraged to use our graph for further analysis. The Microsoft Academic Graph (opens in new tab) can be accessed through the Academic Knowledge API (opens in new tab), or, for power users, we offer another option through Azure Data Lake (opens in new tab) (please contact us if you are interested in the latter).

We hope these improvements enable you to discover more research faster, and that you find the serendipity of our improved Related Papers section useful and enjoyable.

How do you unleash the power of semantic search? As always, we would like to hear from you either through the feedback link at the bottom right of the website, or on Twitter. You can also find our project home page with this blog on the Microsoft Research site at aka.ms/msracad (opens in new tab).

Happy researching!