Knowledge Boosting: Rethinking Medical Contrastive Vision-Language Pre-training

被引：6

作者：

Chen, Xiaofei ^{[1
]}

He, Yuting ^{[1
]}

Xue, Cheng ^{[1
]}

Ge, Rongjun ^{[2
]}

Li, Shuo ^{[3
]}

Yang, Guanyu ^{[1
,4
,5
]}

机构：

[1] Southeast Univ, Minist Educ, Key Lab New Generat Artificial Intelligence Techn, Dhaka, Bangladesh

[2] Nanjing Univ Aeronaut & Astronaut, Nanjing, Peoples R China

[3] Case Western Reserve Univ, Dept Biomed Engn, Cleveland, OH 44106 USA

[4] Southeast Univ, Joint Int Res Lab Med Informat Proc, Nanjing 210096, Peoples R China

[5] Ctr Rech Informat Biomed Sino Francais CRIBs, Nanjing, Peoples R China

来源：

MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION, MICCAI 2023, PT I | 2023年 / 14220卷

关键词：

D O I：

10.1007/978-3-031-43907-0_39

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The foundation models based on pre-training technology have significantly advanced artificial intelligence from theoretical to practical applications. These models have facilitated the feasibility of computer-aided diagnosis for widespread use. Medical contrastive vision-language pre-training, which does not require human annotations, is an effective approach for guiding representation learning using description information in diagnostic reports. However, the effectiveness of pre-training is limited by the large-scale semantic overlap and shifting problems in medical field. To address these issues, we propose the Knowledge-Boosting Contrastive Vision-Language Pre-training framework (KoBo), which integrates clinical knowledge into the learning of vision-language semantic consistency. The framework uses an unbiased, open-set sample-wise knowledge representation to measure negative sample noise and supplement the correspondence between vision-language mutual information and clinical knowledge. Extensive experiments validate the effect of our framework on eight tasks including classification, segmentation, retrieval, and semantic relatedness, achieving comparable or better performance with the zero-shot or few-shot settings. Our code is open on https://github.com/ChenXiaoFei-CS/KoBo.

引用

页码：405 / 415

页数：11

共 28 条

[1]

Alsentzer E., 2019, P 2 CLIN NAT LANG PR, P72, DOI [DOI 10.18653/V1/W19-1909, 10.18653/v1/W19-1909]

[2] ChatGPT: five priorities for research [J].

Bockting, Claudi ;

van Dis, Eva A. M. ;

Bollen, Johan ;

van Rooij, Robert ;

Zuidema, Willem L. .

NATURE, 2023, 614 (7947) :224-226

[3]

Bommasani R, 2021, arXiv, DOI [DOI 10.48550/ARXIV.2108.07258, 10.48550/arXiv.2108.07258]

[4] Align, Reason and Learn: Enhancing Medical Vision-and-Language Pre-training with Knowledge [J].

Chen, Zhihong ;

Li, Guanbin ;

Wan, Xiang .

PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, :5152-5161

[5] Multi-modal Masked Autoencoders for Medical Vision-and-Language Pre-training [J].

Chen, Zhihong ;

Du, Yuhao ;

Hu, Jinpeng ;

Liu, Yang ;

Li, Guanbin ;

Wan, Xiang ;

Chang, Tsung-Hui .

MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION, MICCAI 2022, PT V, 2022, 13435 :679-689

[6]

Desai S, 2020, IEEE WINT CONF APPL, P972, DOI [10.1109/WACV45572.2020.9093360, 10.1109/wacv45572.2020.9093360]

[7]

Dosovitskiy A., 2021, P 9 INT C LEARN REPR

[8]

Johnson AEW, 2019, Arxiv, DOI arXiv:1901.07042

[9] Deep Residual Learning for Image Recognition [J].

He, Kaiming ;

Zhang, Xiangyu ;

Ren, Shaoqing ;

Sun, Jian .

2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :770-778

[10] Geometric Visual Similarity Learning in 3D Medical Image Self-supervised Pre-training [J].

He, Yuting ;

Yang, Guanyu ;

Ge, Rongjun ;

Chen, Yang ;

Coatrieux, Jean-Louis ;

Wang, Boyu ;

Li, Shuo .

2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, :9538-9547

← 1 2 3 →