Literome: extracting knowledge from biomedical publications

Published

As any researcher knows, keeping up with scientific knowledge isn’t easy. This is especially true in the field of medical genetics, where advances in DNA sequencing technology have led to an exponential growth of genomics data. Such data hold the key to identifying disease genes and drug targets, because complex diseases inevitably stem from synergistic perturbations of pathways and other gene networks. Many of these interactions are known, but most of this knowledge resides in academic journals, the number of which has undergone its own exponential growth. It thus has become increasingly difficult for researchers to find relevant knowledge for genomic interpretation and to keep up with new genomics findings. Fortunately, help has arrived with the Literome Project (opens in new tab).*

Literome is an automatic curation system that both extracts genomic knowledge from PubMed (opens in new tab) (one of the world’s largest repositories of medical and life science journal articles) and makes this knowledge available in the cloud, with a website to facilitate browsing, searching, and reasoning. Currently, Literome focuses on the two types of knowledge most pertinent to genomic medicine: directed genic interactions, such as pathways, and genotype-phenotype associations. Users can search for interacting genes and the nature of the interactions, as well as for diseases and drugs associated with a given gene or single nucleotide polymorphism (SNP). Users can also search for indirect connections between two entities; for example, they can look to see if a gene and a disease might be linked by searching for known associations between an interacting gene and a related disease.

Literome builds on Microsoft Research natural language processing (NLP) technology, extracting information from PubMed abstracts via our Statistical Parsing and Linguistics Analysis Toolkit (opens in new tab) (SPLAT), and uses the Microsoft Azure (opens in new tab) cloud platform to store, analyze, and disseminate the information.

Spotlight: blog post

GraphRAG auto-tuning provides rapid adaptation to new domains

GraphRAG uses LLM-generated knowledge graphs to substantially improve complex Q&A over retrieval-augmented generation (RAG). Discover automatic tuning of GraphRAG for new datasets, making it more accurate and relevant.

Scientists can use Literome in a number of ways, from exploratory browsing, to corroborating or refuting new discoveries, to programmatically integrating pathways and genotype-phenotype associations for making discoveries from genomics data. Literome is freely available for noncommercial use through an online service (opens in new tab), or downloadable web services (opens in new tab). It is our hope that Literome will help researchers search genomic medical findings that can lead to new understanding and treatment of genetically mediated diseases.

Hoifung Poon (opens in new tab), Researcher, Microsoft Research

Learn more

____________________
*The Literome Project is a joint project from Hoifung Poon, Chris Quirk, Charlie DeZiel, and David Heckerman of Microsoft Research.