The published research generated by the global research community constitutes a diary of humankind’s scientific achievements. As this output grows year after year, it creates new opportunities for further inquiry—and new challenges in dealing with the volume and complexity of the information. As a result, scholarly big data has been the focus of a growing number of recent research workshops, such as:
- Workshop on Challenges & Issues on Scholarly Big Data Discovery and Collaboration (opens in new tab) (SBD 2014 at IEEE Big Data, October 2014)
- Scholarly Big Data: AI Perspectives, Challenges, and Ideas (opens in new tab) (at AAAI-15, January 2015)
- 2nd WWW Workshop on Big Scholarly Data: Towards the Web of Scholars (opens in new tab) (BigScholar at WWW 2015, May 2015)
- 4th International Workshop on Mining Scientific Publications (opens in new tab) (WOSP at JCDL 2015, June 2015)
For our part, Microsoft Research announced last summer that Microsoft Academic Search (opens in new tab) was evolving from a research project into full-scale production powered by Bing (see Making Cortana the Researcher’s Dream Assistant (opens in new tab)). In addition to integrating scholarly publications directly into Bing search results and Cortana’s notification system, we are taking full advantage of Bing’s capacity to crawl the web and generate structured information from unstructured text. Our Academic Graph of research publications, authors, journals, conferences, universities and fields of study has grown significantly, more than doubling the number of publication records of the previous iteration and offering nearly three times the number of citations between publications.
While our graph continues to grow, today we are announcing the release of a snapshot of this graph for the research community, in an effort to jumpstart new avenues of research at web scale. The Microsoft Academic Graph (MAG) (opens in new tab) can be used immediately. The data is stored as a set of text files, one for each entity in the graph, and one for each relationship type between the entities (paper-paper citations, author-paper, paper-topic and so forth).
Spotlight: blog post
Professor Jevin West, of the University of Washington’s Information School, calls the MAG a game changer. “There has never been a release of bibliographic data at this scale,” he says. “It will allow researchers to study the structure of scientific knowledge, build better algorithms for mapping the ever-expanding corpus and improving information retrieval. I have been waiting for news like this for years. Let the research begin!”
You can download the MAG data directly from Microsoft Azure, or you can mount it from Azure blob storage directly into your own Azure virtual machine. Due to the size of the data, researchers may find it advantageous to use Microsoft’s scalable cloud infrastructure, and to this end, we are encouraging researchers to also apply for an Azure for Research award (opens in new tab) to support their research efforts. Simply include #academicgraph in your award submission—the next deadline is August 15, 2015.
As Professor West says, let the research begin!
—Alex Wade (opens in new tab), Director of Scholarly Communications, Microsoft Research
Learn more