Multi-modal Adapter for Medical Vision-and-Language Learning

被引：1

作者：

Yu, Zheng ^{[1
]}

Qiao, Yanyuan ^{[1
]}

Xie, Yutong ^{[1
]}

Wu, Qi ^{[1
]}

机构：

[1] Univ Adelaide, Australian Inst Machine Learning, Adelaide, SA, Australia

来源：

MACHINE LEARNING IN MEDICAL IMAGING, MLMI 2023, PT I | 2024年 / 14348卷

关键词：

Medical Vision-and-Language Learning; Parameter-Efficient Transfer Learning; Multi-Modal Adapter; MODEL;

D O I：

10.1007/978-3-031-45673-2_39

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Recently, medical vision-and-language learning has attracted great attention from biomedical communities. Thanks to the development of large pre-trained models, the performances on these medical multi-modal learning benchmarks have been greatly improved. However, due to the rapid growth of the model size, full fine-tuning these large pre-trained models has become costly in training and storing such huge parameters for each downstream task. Thus, we propose a parameter-efficient transfer learning method named Medical Multi-Modal Adapter (M(3)AD) to mediate this problem. We select the state-of-the-art M(3)AE model as our baseline, which is pre-trained on 30k medical image-text pairs with multiple proxy tasks and has about 340M parameters. To be specific, we first insert general adapters after multi-head attention layers and feed-forward layers in all transformer blocks of M(3)AE. Then, we specifically design a modality-fusion adapter that adopts multi-head attention mechanisms and we insert them in the cross-modal encoder to enhance the multi-modal interactions. Compared to full fine-tuning, we freeze most parameters in M(3)AE and only train these inserted adapters with much smaller sizes. Extensive experimental results on three medical visual question answering datasets and one medical multi-modal classification dataset demonstrate the effectiveness of our proposed method, where M(3)AD achieves competitive performances compared to full fine-tuning with much fewer training parameters and memory consumption.

引用

页码：393 / 402

页数：10

共 50 条

[21] Uncertainty-Aware Multi-modal Learning via Cross-Modal Random Network Prediction
Wang, Hu
Zhang, Jianpeng
Chen, Yuanhong
Ma, Congbo
Avery, Jodie
Hull, Louise
Carneiro, Gustavo
COMPUTER VISION, ECCV 2022, PT XXXVII, 2022, 13697 : 200 - 217
[22] Multi-Modal Perception for Selective Rendering
Harvey, Carlo
Debattista, Kurt
Bashford-Rogers, Thomas
Chalmers, Alan
COMPUTER GRAPHICS FORUM, 2017, 36 (01) : 172 - 183
[23] Adversarial Multi-Task Learning for Mandarin Prosodic Boundary Prediction With Multi-Modal Embeddings
Yi, Jiangyan
Tao, Jianhua
Fu, Ruibo
Wang, Tao
Zhang, Chu Yuan
Wang, Chenglong
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2023, 31 : 2963 - 2973
[24] A Review of the Application of Multi-modal Deep Learning in Medicine: Bibliometrics and Future Directions
Pei, Xiangdong
Zuo, Ke
Li, Yuan
Pang, Zhengbin
INTERNATIONAL JOURNAL OF COMPUTATIONAL INTELLIGENCE SYSTEMS, 2023, 16 (01)
[25] Unsupervised multi-modal representation learning for affective computing with multi-corpus wearable data
Ross, Kyle
Hungler, Paul
Etemad, Ali
JOURNAL OF AMBIENT INTELLIGENCE AND HUMANIZED COMPUTING, 2021, 14 (4) : 3199 - 3224
[26] Learning consumer preferences through textual and visual data: a multi-modal approach
Liu, Xinyu
Liu, Yezheng
Qian, Yang
Jiang, Yuanchun
Ling, Haifeng
ELECTRONIC COMMERCE RESEARCH, 2023,
[27] Multi-Modal Learning and Relaxation of Physical Conflict for an Exoskeleton Robot with Proprioceptive Perception
Zhang, Xuan
Shu, Yana
Chen, Yu
Chen, Gong
Ye, Jing
Li, Xiu
Li, Xiang
2023 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2023), 2023, : 10490 - 10496
[28] Multi-modal Fusion Brain Tumor Detection Method Based on Deep Learning
Yao Hong-ge
Shen Xin-xia
Li Yu
Yu Jun
Lei Song-ze
ACTA PHOTONICA SINICA, 2019, 48 (07)
[29] Multi-modal tumor segmentation methods based on deep learning: a narrative review
Xue, Hengzhi
Yao, Yudong
Teng, Yueyang
QUANTITATIVE IMAGING IN MEDICINE AND SURGERY, 2024, 14 (01) : 1122 - 1140
[30] Multi-Constraint Latent Representation Learning for Prognosis Analysis Using Multi-Modal Data
Ning, Zhenyuan
Lin, Zehui
Xiao, Qing
Du, Denghui
Feng, Qianjin
Chen, Wufan
Zhang, Yu
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (07) : 3737 - 3750

← 1 2 3 4 5 →