MedFILIP: Medical Fine-Grained Language-Image Pre-Training

被引:0
作者
Liang, Xinjie [1 ]
Li, Xiangyu [1 ]
Li, Fanding [1 ]
Jiang, Jie [1 ]
Dong, Qing [2 ]
Wang, Wei [3 ]
Wang, Kuanquan [1 ]
Dong, Suyu [4 ]
Luo, Gongning [5 ]
Li, Shuo [6 ,7 ]
机构
[1] Harbin Inst Technol, Sch Comp Sci & Technol, Harbin 150001, Peoples R China
[2] Harbin Med Univ, Affiliated Hosp 4, Dept Thorac Surg, Harbin 150088, Peoples R China
[3] Harbin Inst Technol, Sch Comp Sci & Technol, Shenzhen 518071, Peoples R China
[4] Northeast Forestry Univ, Coll Comp & Control Engn, Harbin 150040, Peoples R China
[5] King Abdullah Univ Sci & Technol, Comp Elect & Math Sci & Engn Div, Thuwal 23955, Saudi Arabia
[6] Case Western Reserve Univ, Dept Biomed Engn, Cleveland, OH 44106 USA
[7] Case Western Reserve Univ, Dept Comp & Data Sci, Cleveland, OH 44106 USA
基金
中国国家自然科学基金; 中国博士后科学基金;
关键词
Diseases; Medical diagnostic imaging; Visualization; Contrastive learning; Feature extraction; Data mining; Training; Large language models; Complexity theory; Bioinformatics; CXR imaging; fine-grained; interpretability; vision-language pretraining;
D O I
10.1109/JBHI.2025.3528196
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Medical vision-language pretraining (VLP) that leverages naturally-paired medical image-report data is crucial for medical image analysis. However, existing methods struggle to accurately characterize associations between images and diseases, leading to inaccurate or incomplete diagnostic results. In this work, we propose MedFILIP, a fine-grained VLP model, introduces medical image-specific knowledge through contrastive learning, specifically: 1) An information extractor based on a large language model is proposed to decouple comprehensive disease details from reports, which excels in extracting disease deals through flexible prompt engineering, thereby effectively reducing text complexity while retaining rich information at a tiny cost. 2) A knowledge injector is proposed to construct relationships between categories and visual attributes, which help the model to make judgments based on image features, and fosters knowledge extrapolation to unfamiliar disease categories. 3) A semantic similarity matrix based on fine-grained annotations is proposed, providing smoother, information-richer labels, thus allowing fine-grained image-text alignment. 4) We validate MedFILIP on numerous datasets, e.g., RSNA-Pneumonia, NIH ChestX-ray14, VinBigData, and COVID-19. For single-label, multi-label, and fine-grained classification, our model achieves state-of-the-art performance, the classification accuracy has increased by a maximum of 6.69%.
引用
收藏
页码:3587 / 3597
页数:11
相关论文
共 40 条
  • [1] Alayrac JB, 2022, ADV NEUR IN
  • [2] Alsentzer E., 2019, P 2 CLIN NAT LANG PR, DOI DOI 10.18653/V1/W19-1909
  • [3] [Anonymous], 2019, SIIM-ACR pneumothorax segmentation
  • [4] Making the Most of Text Semantics to Improve Biomedical Vision-Language Processing
    Boecking, Benedikt
    Usuyama, Naoto
    Bannur, Shruthi
    Castro, Daniel C.
    Schwaighofer, Anton
    Hyland, Stephanie
    Wetscherek, Maria
    Naumann, Tristan
    Nori, Aditya
    Alvarez-Valle, Javier
    Poon, Hoifung
    Oktay, Ozan
    [J]. COMPUTER VISION, ECCV 2022, PT XXXVI, 2022, 13696 : 1 - 21
  • [5] Branson S, 2014, Arxiv, DOI [arXiv:1406.2952, DOI 10.48550/ARXIV.1406.2952, 10.48550/arXiv.1406.2952]
  • [6] Chen HL, 2023, COMM COM INF SC, V1794, P272, DOI 10.1007/978-981-99-1648-1_23
  • [7] Knowledge Boosting: Rethinking Medical Contrastive Vision-Language Pre-training
    Chen, Xiaofei
    He, Yuting
    Xue, Cheng
    Ge, Rongjun
    Li, Shuo
    Yang, Guanyu
    [J]. MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION, MICCAI 2023, PT I, 2023, 14220 : 405 - 415
  • [8] Efficient Subclass Segmentation in Medical Images
    Dai, Linrui
    Lei, Wenhui
    Zhang, Xiaofan
    [J]. MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION, MICCAI 2023, PT II, 2023, 14221 : 266 - 275
  • [9] PhysioBank, PhysioToolkit, and PhysioNet - Components of a new research resource for complex physiologic signals
    Goldberger, AL
    Amaral, LAN
    Glass, L
    Hausdorff, JM
    Ivanov, PC
    Mark, RG
    Mietus, JE
    Moody, GB
    Peng, CK
    Stanley, HE
    [J]. CIRCULATION, 2000, 101 (23) : E215 - E220
  • [10] DermaKNet: Incorporating the Knowledge of Dermatologists to Convolutional Neural Networks for Skin Lesion Diagnosis
    Gonzalez-Diaz, Ivan
    [J]. IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2019, 23 (02) : 547 - 559