Knowledge-enhanced visual-language pre-training on chest radiology images

被引:41
作者
Zhang, Xiaoman [1 ,2 ]
Wu, Chaoyi [1 ,2 ]
Zhang, Ya [1 ,2 ]
Xie, Weidi [1 ,2 ]
Wang, Yanfeng [1 ,2 ]
机构
[1] Shanghai Jiao Tong Univ, Cooperat Medianet Innovat Ctr, Shanghai 200240, Peoples R China
[2] Shanghai Artificial Intelligence Lab, Shanghai 200232, Peoples R China
基金
国家重点研发计划;
关键词
SYSTEM;
D O I
10.1038/s41467-023-40260-7
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
While multi-modal foundation models pre-trained on large-scale data have been successful in natural language understanding and vision recognition, their use in medical domains is still limited due to the fine-grained nature of medical tasks and the high demand for domain knowledge. To address this challenge, we propose an approach called Knowledge-enhanced Auto Diagnosis (KAD) which leverages existing medical domain knowledge to guide vision-language pre-training using paired chest X-rays and radiology reports. We evaluate KAD on four external X-ray datasets and demonstrate that its zero-shot performance is not only comparable to that of fully supervised models but also superior to the average of three expert radiologists for three (out of five) pathologies with statistical significance. Moreover, when few-shot annotation is available, KAD outperforms all existing approaches in fine-tuning settings, demonstrating its potential for application in different clinical scenarios. Despite the success of multi-modal foundation models in natural language and vision tasks, their use in medical domains is limited. Here, the authors propose to train a foundation model for chest X-ray diagnosis that combines medical domain knowledge with vision-language representation learning.
引用
收藏
页数:12
相关论文
共 37 条
[1]  
Alayrac J-B., 2022, Adv. Neural Inf. Process. Syst, V35, P23716
[2]   The Unified Medical Language System (UMLS): integrating biomedical terminology [J].
Bodenreider, O .
NUCLEIC ACIDS RESEARCH, 2004, 32 :D267-D270
[3]   Making the Most of Text Semantics to Improve Biomedical Vision-Language Processing [J].
Boecking, Benedikt ;
Usuyama, Naoto ;
Bannur, Shruthi ;
Castro, Daniel C. ;
Schwaighofer, Anton ;
Hyland, Stephanie ;
Wetscherek, Maria ;
Naumann, Tristan ;
Nori, Aditya ;
Alvarez-Valle, Javier ;
Poon, Hoifung ;
Oktay, Ozan .
COMPUTER VISION, ECCV 2022, PT XXXVI, 2022, 13696 :1-21
[4]  
Bommasani R., 2021, arXiv preprint arXiv:2108.07258, DOI [10.48550/arXiv.2108.07258, DOI 10.48550/ARXIV.2108.07258]
[5]  
Brown TB, 2020, ARXIV
[6]   PadChest: A large chest x-ray image dataset with multi-label annotated reports [J].
Bustos, Aurelia ;
Pertusa, Antonio ;
Salinas, Jose-Maria ;
de la Iglesia-Vaya, Maria .
MEDICAL IMAGE ANALYSIS, 2020, 66
[7]   Align, Reason and Learn: Enhancing Medical Vision-and-Language Pre-training with Knowledge [J].
Chen, Zhihong ;
Li, Guanbin ;
Wan, Xiang .
PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, :5152-5161
[8]   GPT-3: What's it good for? [J].
Dale, Robert .
NATURAL LANGUAGE ENGINEERING, 2021, 27 (01) :113-118
[9]  
Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848
[10]  
Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171