Microsoft Academic Graph (opens in new tab) (MAG) is updated weekly to keep up with the pace of scientific discoveries and technology advances. Just a few months ago, in our April 20, 2018 blog (opens in new tab), we disclosed that the graph contained more than 173 million articles. As of July 31, 2018, that number had risen to more than 176 million. We update these numbers every time the graph is uploaded online.
Starting in August 2018, you’ll see a significant jump in the number of publications indexed in MAG, to more than 200 million. This is due to our revised approach for citing patents.
Starting now, MAG will provide improved patent coverage thanks to a partnership with domain expert Lens.org (opens in new tab). Lens.org is a public benefit company spun out from Cambia (opens in new tab) and the Queensland University of Technology in Australia (opens in new tab). The Lens–including its predecessor, Patent Lens–has been a free, open and global full text patent search site for over 18 years, serving over 110M patent records from more than 100 jurisdictions. Their vision of mapping out the influence of academic literature so as to turn science into social outcomes is well articulated in the two articles published in Nature (2017) (opens in new tab) and Nature Biotechnology (2018) (opens in new tab). With the domain expertise from Lens.org, we are able to provide a more holistic account of scientific discoveries as communicated through patents. Patents have always been included as publications in MAG. Like academic papers, patents have abstracts, authors (who are often the inventors, but usually their patent attorneys), and affiliations (often the institution or assignee of a patent). Patents cite other patents and scholarly articles, and such citations are used to assess impacts of the cited work for ranking purposes. Indeed, many inventions are published both in the form of a patent disclosure and a scholarly paper.
Patents, however, are quite different from scholarly papers—and require different treatment in MAG—in several aspects.
First, patents are jurisdiction-specific, so a single invention can be filed, published, and granted in multiple countries. Such duplications would be regarded as self-plagiarism for scholarly articles, but they are an absolute necessity and a normal practice for patent applications. Because of the multiple patents for a single invention, historically, we erred on the conservative side and excluded many patents from being included in MAG. Now that we are working with Lens.org, we can use their concept of “patent family,” where multiple patents derived from a single invention can be recognized and grouped together. Just like multiple versions of the same article are grouped into a single publication record in MAG, this grouping enables us to more accurately identify inventions from their patent families and assess the impacts of the inventions.
Secondly, while scholarly articles are often classified by their fields of study, patents are officially classified based on the utility of the underlying invention. For example, inventions to treat a cancer may be based on internal medicine, radiation, gene therapy, or chemotherapy. However, because their utilities are the same, patents of these inventions often receive the same patent classification code. Cross-referencing the utility and the scientific field-based classifications will provide us with a holistic view into how various disciplines of science have paved the way to the technology developments that have been propelling civilization.
Additionally, patents have a highly formal syntax that is not well designed to optimally share knowledge with the public or with other scientists. This very structure, however, does mean that patents are uniquely positioned to benefit from machine learning and AI techniques to extract and present meaning in useful ways. Luckily, it appears that the deep learning techniques we use to understand the contents of a document (as described in this blog (opens in new tab)) can be applied as a first pass to read the legalese of a patent disclosure and assign the fields of study mentioned in the invention. Going forward, we have identified several areas that we can train our machine to better decipher legalese laden documents and read patent disclosures.
Not only do we now have more patents in the MAG, but the impact of better patent coverage changes the search result pages for many institutions and authors, especially those from industry. For example, the “before” and “after” of Canon Inc. are shown below – not only are the top publications listed now dominated by patents, but the important authors on the left sidebar change significantly as well.
As another example, the patents related to “chemotherapy” (by checking the publication type Patents on the filter) have risen from 2,094 to 3,213, with the work from Johns Hopkins University, which is both published as a paper in Nature Medicine and as a patent, properly taking the top search result as shown below. Note the dramatic changes in the rankings of author and affiliation filters as well.
The landing page for each publication is now enhanced with a pointer to the corresponding page at Lens.org. For example, the first search result above that is both a paper and a patent would have both the paper’s and Lens.org’s URLs in the Sources section.
Clicking on the link to Lens.org will take you to the pages (downloaded from Lens.org (opens in new tab) and shown below with permission) where the family of eight applications in US, Japan, Canada, World Intellectual Property Organization (WIPO), and EU are aggregated — including the granting as an EU patent in November 2017, almost a decade after the scientific paper was first published in Nature Medicine.
Note that on Lens.org, you can already see the institutions that are active and have been weaving influences on a specific technology. For example, the results for using chemotherapy to treat tumors on Lens.org currently look as follows (screenshot from Lens.org web page, used with permission):
Using weighting of patent families and comprehensive linking of these two massive datasets, you can now explore the scope of influence that scholarship has on enterprise. With this foundation, we expect to jointly develop the first comprehensive, open mapping facility and platform – called In4M (International Industry & Innovation Influence Mapping, pronounced as “InFOURm”) that will let scholars, institutions, companies and investors discover new opportunities and to rank relative performance to guide improvements in delivering real world outcomes to society.
We are excited for this additional information as it will enable us to more deeply understand the world of innovation, how various institutions choose to capitalize their investments and what strategies they have taken. We look forward to helping you find more insights from mining Microsoft Academic Graph.
Happy researching!