A whole-slide foundation model for digital pathology from real-world data
- Hanwen Xu ,
- Naoto Usuyama ,
- Jaspreet Bagga ,
- Sheng Zhang ,
- Rajesh Rao ,
- Tristan Naumann ,
- Cliff Wong ,
- Zelalem Gero ,
- Javier González ,
- Yu Gu ,
- Yanbo Xu ,
- Mu-Hsin Wei ,
- Wenhui Wang ,
- Shuming Ma ,
- Furu Wei ,
- Jianwei Yang ,
- Chunyuan Li ,
- Jianfeng Gao ,
- Jaylen Rosemon ,
- Tucker Bower ,
- Soohee Lee ,
- R. Weerasinghe ,
- Bill Wright ,
- Ari Robicsek ,
- Brian Piening ,
- Carlo Bifulco ,
- Sheng Wang ,
- Hoifung Poon
Nature |
Digital pathology poses unique computational challenges, as a standard gigapixel slide may comprise tens of thousands of image tiles. Prior models have often resorted to subsampling a small portion of tiles for each slide, thus missing the important slide-level context. Here we present Prov-GigaPath, a whole-slide pathology foundation model pretrained on 1.3 billion 256 × 256 pathology image tiles in 171,189 whole slides from Providence, a large US health network comprising 28 cancer centres.