MAIRA-2: Grounded Radiology Report Generation

Shruthi Bannur; Kenza Bouzid; Daniel Coelho de Castro; Anton Schwaighofer; Sam Bond-Taylor; Maximilian Ilse; Fernando Pérez-García; Valentina Salvatelli; Harshita Sharma; Felix Meissen; Mercy Ranjit; Shaury Srivastav; Julia Gong; Fabian Falck; Ozan Oktay; Anja Thieme; Matthew P Lungren; Maria Teodora Wetscherek; Javier Alvarez-Valle; Stephanie Hyland

MAIRA-2: Grounded Radiology Report Generation

Shruthi Bannur ,
Kenza Bouzid ,
Daniel Coelho de Castro ,
Anton Schwaighofer ,
Sam Bond-Taylor ,
Maximilian Ilse ,
Fernando Pérez-García ,
Valentina Salvatelli ,
Harshita Sharma ,
Felix Meissen ,
Mercy Ranjit ,
Shaury Srivastav ,
Julia Gong ,
Fabian Falck ,
Ozan Oktay ,
Anja Thieme ,
Matthew P Lungren ,
Maria Teodora Wetscherek ,
Javier Alvarez-Valle ,
Stephanie Hyland

MSR-TR-2024-18 | June 2024

Published by Microsoft

PDF

Download BibTex

Radiology reporting is a complex task that requires detailed image understanding, integration of multiple inputs, including comparison with prior imaging, and precise language generation. This makes it ideal for the development and use of generative multimodal models. Here, we extend report generation to include the localisation of individual findings on the image – a task we call grounded report generation. Prior work indicates that grounding is important for clarifying image understanding and interpreting AI-generated text. Therefore, grounded reporting stands to improve the utility and transparency of automated report drafting. To enable evaluation of grounded reporting, we propose a novel evaluation framework – RadFact – leveraging the reasoning capabilities of large language models (LLMs). RadFact assesses the factuality of individual generated sentences, as well as correctness of generated spatial localisations when present. We introduce MAIRA-2, a large multimodal model combining a radiology-specific image encoder with a LLM, and trained for the new task of grounded report generation on chest X-rays. MAIRA-2 uses more comprehensive inputs than explored previously: the current frontal image, the current lateral image, the prior frontal image and prior report, as well as the Indication, Technique and Comparison sections of the current report. We demonstrate that these additions significantly improve report quality and reduce hallucinations, establishing a new state of the art on findings generation (without grounding) on MIMIC-CXR while demonstrating the feasibility of grounded reporting as a novel and richer task.

Related Tools

RadFact: An LLM-based Evaluation Metric for AI-generated Radiology Reporting

November 21, 2024

RadFact is a framework for the evaluation of model-generated radiology reports given a ground-truth report, with or without grounding. Leveraging the logical inference capabilities of large language models, RadFact is not a single number but a suite of metrics, capturing aspects of precision and recall at text-only and text-and-grounding levels.

Access

MAIRA-2 model

November 21, 2024

MAIRA-2 is a multimodal transformer designed for the generation of grounded or non-grounded radiology reports from chest X-rays. It is described in more detail in MAIRA-2: Grounded Radiology Report Generation (S. Bannur, K. Bouzid et al., 2024). MAIRA-2 has been built for research purposes only and is being shared to facilitate comparison and further research.

Access