Multi-modal Adapter for Medical Vision-and-Language Learning

被引：1

作者：

Yu, Zheng ^{[1
]}

Qiao, Yanyuan ^{[1
]}

Xie, Yutong ^{[1
]}

Wu, Qi ^{[1
]}

机构：

[1] Univ Adelaide, Australian Inst Machine Learning, Adelaide, SA, Australia

来源：

MACHINE LEARNING IN MEDICAL IMAGING, MLMI 2023, PT I | 2024年 / 14348卷

关键词：

Medical Vision-and-Language Learning; Parameter-Efficient Transfer Learning; Multi-Modal Adapter; MODEL;

D O I：

10.1007/978-3-031-45673-2_39

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Recently, medical vision-and-language learning has attracted great attention from biomedical communities. Thanks to the development of large pre-trained models, the performances on these medical multi-modal learning benchmarks have been greatly improved. However, due to the rapid growth of the model size, full fine-tuning these large pre-trained models has become costly in training and storing such huge parameters for each downstream task. Thus, we propose a parameter-efficient transfer learning method named Medical Multi-Modal Adapter (M(3)AD) to mediate this problem. We select the state-of-the-art M(3)AE model as our baseline, which is pre-trained on 30k medical image-text pairs with multiple proxy tasks and has about 340M parameters. To be specific, we first insert general adapters after multi-head attention layers and feed-forward layers in all transformer blocks of M(3)AE. Then, we specifically design a modality-fusion adapter that adopts multi-head attention mechanisms and we insert them in the cross-modal encoder to enhance the multi-modal interactions. Compared to full fine-tuning, we freeze most parameters in M(3)AE and only train these inserted adapters with much smaller sizes. Extensive experimental results on three medical visual question answering datasets and one medical multi-modal classification dataset demonstrate the effectiveness of our proposed method, where M(3)AD achieves competitive performances compared to full fine-tuning with much fewer training parameters and memory consumption.

引用

页码：393 / 402

页数：10

共 50 条

[41] Multi-modal cascade detection of pipeline defects based on deep transfer metric learning
Gao, Boxuan
Zhao, Hong
Miao, Xingyuan
ENGINEERING FAILURE ANALYSIS, 2024, 160
[42] Multi-Modal Deep Learning Diagnosis of Parkinson's Disease-A Systematic Review
Skaramagkas, Vasileios
Pentari, Anastasia
Kefalopoulou, Zinovia
Tsiknakis, Manolis
IEEE TRANSACTIONS ON NEURAL SYSTEMS AND REHABILITATION ENGINEERING, 2023, 31 : 2399 - 2423
[43] Recognizing multi-modal sensor signals using evolutionary learning of dynamic Bayesian networks
Lee, Young-Seol
Cho, Sung-Bae
PATTERN ANALYSIS AND APPLICATIONS, 2014, 17 (04) : 695 - 707
[44] Characterizing habit learning in the human brain at the individual and group levels: A multi-modal MRI study
Gera, Rani
Bar Or, Maya
Tavor, Ido
Roll, Dana
Cockburn, Jeffrey
Barak, Segev
Tricomi, Elizabeth
O'Doherty, John P.
Schonberg, Tom
NEUROIMAGE, 2023, 272
[45] Contextual and Cross-Modal Interaction for Multi-Modal Speech Emotion Recognition
Yang, Dingkang
Huang, Shuai
Liu, Yang
Zhang, Lihua
IEEE SIGNAL PROCESSING LETTERS, 2022, 29 : 2093 - 2097
[46] A Conversational Agent Framework with Multi-modal Personality Expression
Sonlu, Sinan
Gudukbay, Ugur
Durupinar, Funda
ACM TRANSACTIONS ON GRAPHICS, 2021, 40 (01):
[47] Multi-Modal Sensor Medical Image Fusion Based on Multiple Salient Features With Guided Image Filter
Li, Weisheng
Jia, Linghui
Du, Jiao
IEEE ACCESS, 2019, 7 : 173019 - 173033
[48] Managing distributed trust relationships for multi-modal authentication
Van Hamme, Tim
Preuveneers, Davy
Joosen, Wouter
JOURNAL OF INFORMATION SECURITY AND APPLICATIONS, 2018, 40 : 258 - 270
[49] Network-Clustered Multi-Modal Bug Localization
Thong Hoang
Oentaryo, Richard J.
Le, Tien-Duy B.
Lo, David
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2019, 45 (10) : 1002 - 1023
[50] Semantic Alignment Network for Multi-Modal Emotion Recognition
Hou, Mixiao
Zhang, Zheng
Liu, Chang
Lu, Guangming
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (09) : 5318 - 5329

← 1 2 3 4 5 →