共 50 条
Multi-modal Adapter for Medical Vision-and-Language Learning
被引:1
|作者:
Yu, Zheng
[1
]
Qiao, Yanyuan
[1
]
Xie, Yutong
[1
]
Wu, Qi
[1
]
机构:
[1] Univ Adelaide, Australian Inst Machine Learning, Adelaide, SA, Australia
来源:
MACHINE LEARNING IN MEDICAL IMAGING, MLMI 2023, PT I
|
2024年
/
14348卷
关键词:
Medical Vision-and-Language Learning;
Parameter-Efficient Transfer Learning;
Multi-Modal Adapter;
MODEL;
D O I:
10.1007/978-3-031-45673-2_39
中图分类号:
TP18 [人工智能理论];
学科分类号:
081104 ;
0812 ;
0835 ;
1405 ;
摘要:
Recently, medical vision-and-language learning has attracted great attention from biomedical communities. Thanks to the development of large pre-trained models, the performances on these medical multi-modal learning benchmarks have been greatly improved. However, due to the rapid growth of the model size, full fine-tuning these large pre-trained models has become costly in training and storing such huge parameters for each downstream task. Thus, we propose a parameter-efficient transfer learning method named Medical Multi-Modal Adapter (M(3)AD) to mediate this problem. We select the state-of-the-art M(3)AE model as our baseline, which is pre-trained on 30k medical image-text pairs with multiple proxy tasks and has about 340M parameters. To be specific, we first insert general adapters after multi-head attention layers and feed-forward layers in all transformer blocks of M(3)AE. Then, we specifically design a modality-fusion adapter that adopts multi-head attention mechanisms and we insert them in the cross-modal encoder to enhance the multi-modal interactions. Compared to full fine-tuning, we freeze most parameters in M(3)AE and only train these inserted adapters with much smaller sizes. Extensive experimental results on three medical visual question answering datasets and one medical multi-modal classification dataset demonstrate the effectiveness of our proposed method, where M(3)AD achieves competitive performances compared to full fine-tuning with much fewer training parameters and memory consumption.
引用
收藏
页码:393 / 402
页数:10
相关论文