Towards a general-purpose foundation model for computational pathology

被引:256
作者
Chen, Richard J. [1 ,2 ,3 ,4 ,5 ]
Ding, Tong [1 ,6 ]
Lu, Ming Y. [1 ,2 ,3 ,4 ,7 ]
Williamson, Drew F. K. [1 ,2 ,3 ]
Jaume, Guillaume [1 ,2 ,3 ,4 ]
Song, Andrew H. [1 ,2 ,3 ,4 ]
Chen, Bowen [1 ,2 ]
Zhang, Andrew [1 ,2 ,3 ,4 ,8 ]
Shao, Daniel [1 ,2 ,3 ,4 ,8 ]
Shaban, Muhammad [1 ,2 ,3 ,4 ]
Williams, Mane [1 ,2 ,3 ,4 ,5 ]
Oldenburg, Lukas [1 ]
Weishaupt, Luca L. [1 ,2 ,3 ,4 ,8 ]
Wang, Judy J. [1 ]
Vaidya, Anurag [1 ,2 ,3 ,4 ,8 ]
Le, Long Phi [2 ,8 ]
Gerber, Georg [1 ]
Sahai, Sharifa [1 ,2 ,3 ,4 ,9 ]
Williams, Walt [1 ,6 ]
Mahmood, Faisal [1 ,2 ,3 ,4 ,10 ]
机构
[1] Harvard Med Sch, Brigham & Womens Hosp, Dept Pathol, Boston, MA 02115 USA
[2] Harvard Med Sch, Massachusetts Gen Hosp, Dept Pathol, Boston, MA 02115 USA
[3] Broad Inst Harvard & MIT, Canc Program, Cambridge, MA 02142 USA
[4] Dana Farber Canc Inst, Canc Data Sci Program, Boston, MA 02215 USA
[5] Harvard Med Sch, Dept Biomed Informat, Boston, MA USA
[6] Harvard Univ, Harvard John A Paulson Sch Engn & Appl Sci, Cambridge, MA USA
[7] Massachusetts Inst Technol MIT, Elect Engn & Comp Sci, Cambridge, MA USA
[8] Harvard MIT, Hlth Sci & Technol, Cambridge, MA USA
[9] Harvard Univ, Dept Syst Biol, Cambridge, MA USA
[10] Harvard Univ, Harvard Data Sci Initiat, Cambridge, MA 02138 USA
基金
美国国家卫生研究院;
关键词
SOMATIC GENOMIC LANDSCAPE; ARTIFICIAL-INTELLIGENCE; CANCER; ADENOCARCINOMAS; BIOPSIES; FEATURES; SYSTEM;
D O I
10.1038/s41591-024-02857-3
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Quantitative evaluation of tissue images is crucial for computational pathology (CPath) tasks, requiring the objective characterization of histopathological entities from whole-slide images (WSIs). The high resolution of WSIs and the variability of morphological features present significant challenges, complicating the large-scale annotation of data for high-performance applications. To address this challenge, current efforts have proposed the use of pretrained image encoders through transfer learning from natural image datasets or self-supervised learning on publicly available histopathology datasets, but have not been extensively developed and evaluated across diverse tissue types at scale. We introduce UNI, a general-purpose self-supervised model for pathology, pretrained using more than 100 million images from over 100,000 diagnostic H&E-stained WSIs (>77 TB of data) across 20 major tissue types. The model was evaluated on 34 representative CPath tasks of varying diagnostic difficulty. In addition to outperforming previous state-of-the-art models, we demonstrate new modeling capabilities in CPath such as resolution-agnostic tissue classification, slide classification using few-shot class prototypes, and disease subtyping generalization in classifying up to 108 cancer types in the OncoTree classification system. UNI advances unsupervised representation learning at scale in CPath in terms of both pretraining data and downstream evaluation, enabling data-efficient artificial intelligence models that can generalize and transfer to a wide range of diagnostically challenging tasks and clinical workflows in anatomic pathology.
引用
收藏
页码:850 / 862
页数:13
相关论文
共 174 条
[41]   The Somatic Genomic Landscape of Chromophobe Renal Cell Carcinoma [J].
Davis, Caleb F. ;
Ricketts, Christopher J. ;
Wang, Min ;
Yang, Lixing ;
Cherniack, Andrew D. ;
Shen, Hui ;
Buhay, Christian ;
Kang, Hyojin ;
Kim, Sang Cheol ;
Fahey, Catherine C. ;
Hacker, Kathryn E. ;
Bhanot, Gyan ;
Gordenin, Dmitry A. ;
Chu, Andy ;
Gunaratne, Preethi H. ;
Biehl, Michael ;
Seth, Sahil ;
Kaipparettu, Benny A. ;
Bristow, Christopher A. ;
Donehower, Lawrence A. ;
Wallen, Eric M. ;
Smith, Angela B. ;
Tickoo, Satish K. ;
Tamboli, Pheroze ;
Reuter, Victor ;
Schmidt, Laura S. ;
Hsieh, James J. ;
Choueiri, Toni K. ;
Hakimi, A. Ari ;
Chin, Lynda ;
Meyerson, Matthew ;
Kucherlapati, Raju ;
Park, Woong-Yang ;
Robertson, A. Gordon ;
Laird, Peter W. ;
Henske, Elizabeth P. ;
Kwiatkowski, David J. ;
Park, Peter J. ;
Morgan, Margaret ;
Shuch, Brian ;
Muzny, Donna ;
Wheeler, David A. ;
Linehan, W. Marston ;
Gibbs, Richard A. ;
Rathmell, W. Kimryn ;
Creighton, Chad J. ;
Signoretti, Sabina ;
Seiler, Michael ;
Chao, Hsu ;
Dahdouli, Mike .
CANCER CELL, 2014, 26 (03) :319-330
[42]  
Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848
[43]  
Devlin J., 2018, P 2019 C N AM CHAPTE, V1
[44]   Human-interpretable image features derived from densely mapped cancer pathology slides predict diverse molecular phenotypes [J].
Diao, James A. ;
Wang, Jason K. ;
Chui, Wan Fung ;
Mountain, Victoria ;
Gullapally, Sai Chowdary ;
Srinivasan, Ramprakash ;
Mitchell, Richard N. ;
Glass, Benjamin ;
Hoffman, Sara ;
Rao, Sudha K. ;
Maheshwari, Chirag ;
Lahiri, Abhik ;
Prakash, Aaditya ;
McLoughlin, Ryan ;
Kerner, Jennifer K. ;
Resnick, Murray B. ;
Montalto, Michael C. ;
Khosla, Aditya ;
Wapinski, Ilan N. ;
Beck, Andrew H. ;
Elliott, Hunter L. ;
Taylor-Weiner, Amaro .
NATURE COMMUNICATIONS, 2021, 12 (01)
[45]  
Dodge J, 2021, 2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), P1286
[46]  
Dosovitskiy A., 2021, INT C LEARNING REPRE
[47]   Learning Visual Representations via Language-Guided Sampling [J].
El Banani, Mohamed ;
Desai, Karan ;
Johnson, Justin .
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, :19208-19220
[48]   EVA: Exploring the Limits of Masked Visual Representation Learning at Scale [J].
Fang, Yuxin ;
Wang, Wen ;
Xie, Binhui ;
Sun, Quan ;
Wu, Ledell ;
Wang, Xinggang ;
Huang, Tiejun ;
Wang, Xinlong ;
Cao, Yue .
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, :19358-19369
[49]  
Fang Z., 2020, INT C LEARNING REPRE
[50]  
Fei-Fei L, 2005, PROC CVPR IEEE, P524