MCPL: Multi-Modal Collaborative Prompt Learning for Medical Vision-Language Model

被引：3

作者：

Wang, Pengyu ^{[1
]}

Zhang, Huaqi ^{[2
]}

Yuan, Yixuan ^{[1
]}

机构：

[1] Chinese Univ Hong Kong, Dept Elect Engn, Hong Kong, Peoples R China

[2] Beijing Jiaotong Univ, Sch Comp Sci & Technol, Beijing 100091, Peoples R China

来源：

IEEE TRANSACTIONS ON MEDICAL IMAGING | 2024年 / 43卷 / 12期

关键词：

Task analysis; Collaboration; Adaptation models; Medical diagnostic imaging; Pipelines; Computational modeling; Pathology; Multi-modal prompting; prompt collaboration; vision-language model; medical reports and images;

D O I：

10.1109/TMI.2024.3418408

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

Multi-modal prompt learning is a high-performance and cost-effective learning paradigm, which learns text as well as image prompts to tune pre-trained vision-language (V-L) models like CLIP for adapting multiple downstream tasks. However, recent methods typically treat text and image prompts as independent components without considering the dependency between prompts. Moreover, extending multi-modal prompt learning into the medical field poses challenges due to a significant gap between general- and medical-domain data. To this end, we propose a Multi-modal Collaborative Prompt Learning (MCPL) pipeline to tune a frozen V-L model for aligning medical text-image representations, thereby achieving medical downstream tasks. We first construct the anatomy-pathology (AP) prompt for multi-modal prompting jointly with text and image prompts. The AP prompt introduces instance-level anatomy and pathology information, thereby making a V-L model better comprehend medical reports and images. Next, we propose graph-guided prompt collaboration module (GPCM), which explicitly establishes multi-way couplings between the AP, text, and image prompts, enabling collaborative multi-modal prompt producing and updating for more effective prompting. Finally, we develop a novel prompt configuration scheme, which attaches the AP prompt to the query and key, and the text/image prompt to the value in self-attention layers for improving the interpretability of multi-modal prompts. Extensive experiments on numerous medical classification and object detection datasets show that the proposed pipeline achieves excellent effectiveness and generalization. Compared with state-of-the-art prompt learning methods, MCPL provides a more reliable multi-modal prompt paradigm for reducing tuning costs of V-L models on medical downstream tasks. Our code: https://github.com/CUHK-AIM-Group/MCPL.

引用

页码：4224 / 4235

页数：12

共 63 条

[1]

Ai W, 2025, IEEE T NEUR NET LEAR, V36, P4908, DOI 10.1109/TNNLS.2024.3367940

[2]

Awadalla Anas, 2023, arXiv

[3]

Bubeck Sebastien, 2023, arXiv

[4] Understanding and Improving Visual Prompting: A Label-Mapping Perspective [J].

Chen, Aochuan ;

Yao, Yuguang ;

Chen, Pin-Yu ;

Zhang, Yihua ;

Liu, Sijia .

2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, :19133-19143

[5]

Chen Zhihao, 2023, arXiv

[6] Decoupling Zero-Shot Semantic Segmentation [J].

Ding, Jian ;

Xue, Nan ;

Xia, Gui-Song ;

Dai, Dengxin .

2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, :11573-11582

[7]

Dosovitskiy A, 2020, INT C LEARN REPR

[8] PromptDet: Towards Open-Vocabulary Detection Using Uncurated Images [J].

Feng, Chengjian ;

Zhong, Yujie ;

Jie, Zequn ;

Chu, Xiangxiang ;

Ren, Haibing ;

Wei, Xiaolin ;

Xie, Weidi ;

Ma, Lin .

COMPUTER VISION, ECCV 2022, PT IX, 2022, 13669 :701-717

[9] CLIP-Adapter: Better Vision-Language Models with Feature Adapters [J].

Gao, Peng ;

Geng, Shijie ;

Zhang, Renrui ;

Ma, Teli ;

Fang, Rongyao ;

Zhang, Yongfeng ;

Li, Hongsheng ;

Qiao, Yu .

INTERNATIONAL JOURNAL OF COMPUTER VISION, 2024, 132 (02) :581-595

[10] Texts as Images in Prompt Tuning for Multi-Label Image Recognition [J].

Guo, Zixian ;

Dong, Bowen ;

Ji, Zhilong ;

Bai, Jinfeng ;

Guo, Yiwen ;

Zuo, Wangmeng .

2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, :2808-2817

← 1 2 3 4 5 6 7 →