Multi-modal Adapter for Medical Vision-and-Language Learning

被引：1

作者：

Yu, Zheng ^{[1
]}

Qiao, Yanyuan ^{[1
]}

Xie, Yutong ^{[1
]}

Wu, Qi ^{[1
]}

机构：

[1] Univ Adelaide, Australian Inst Machine Learning, Adelaide, SA, Australia

来源：

MACHINE LEARNING IN MEDICAL IMAGING, MLMI 2023, PT I | 2024年 / 14348卷

关键词：

Medical Vision-and-Language Learning; Parameter-Efficient Transfer Learning; Multi-Modal Adapter; MODEL;

D O I：

10.1007/978-3-031-45673-2_39

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Recently, medical vision-and-language learning has attracted great attention from biomedical communities. Thanks to the development of large pre-trained models, the performances on these medical multi-modal learning benchmarks have been greatly improved. However, due to the rapid growth of the model size, full fine-tuning these large pre-trained models has become costly in training and storing such huge parameters for each downstream task. Thus, we propose a parameter-efficient transfer learning method named Medical Multi-Modal Adapter (M(3)AD) to mediate this problem. We select the state-of-the-art M(3)AE model as our baseline, which is pre-trained on 30k medical image-text pairs with multiple proxy tasks and has about 340M parameters. To be specific, we first insert general adapters after multi-head attention layers and feed-forward layers in all transformer blocks of M(3)AE. Then, we specifically design a modality-fusion adapter that adopts multi-head attention mechanisms and we insert them in the cross-modal encoder to enhance the multi-modal interactions. Compared to full fine-tuning, we freeze most parameters in M(3)AE and only train these inserted adapters with much smaller sizes. Extensive experimental results on three medical visual question answering datasets and one medical multi-modal classification dataset demonstrate the effectiveness of our proposed method, where M(3)AD achieves competitive performances compared to full fine-tuning with much fewer training parameters and memory consumption.

引用

页码：393 / 402

页数：10

共 50 条

[31] A knowledge and data-driven optimal planning scheme for multi-modal vision transmission systems
Yong, Jia
Liu, Kai
MATHEMATICAL BIOSCIENCES AND ENGINEERING, 2023, 20 (07) : 11939 - 11956
[32] Pixel-level structure awareness for enhancing multi-modal medical image fusion
Wei, Lisi
Zhu, Rui
Li, Xiongfei
Zhao, Libo
Hu, Xiaohan
Zhang, Xiaoli
BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2024, 97
[33] Multi-modal IoT-based medical data processing for disease diagnosis using Heuristic-derived deep learning
Kayalvizhi, S.
Nagarajan, S.
Deepa, J.
Hemapriya, K.
BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2023, 85
[34] Multi-modal activity recognition from egocentric vision, semantic enrichment and lifelogging applications for the care of dementia
Meditskos, Georgios
Plans, Pierre-Marie
Stavropoulos, Thanos G.
Benois-Pineau, Jenny
Buso, Vincent
Kompatsiaris, Ioannis
JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2018, 51 : 169 - 190
[35] Dynamically engineered multi-modal feature learning for predictions of office building cooling loads
Liu, Yiren
Zhao, Xiangyu
Qin, S. Joe
APPLIED ENERGY, 2024, 355
[36] Decentralized signal control for multi-modal traffic network: A deep reinforcement learning approach
Yu, Jiajie
Laharotte, Pierre-Antoine
Han, Yu
Leclercq, Ludovic
TRANSPORTATION RESEARCH PART C-EMERGING TECHNOLOGIES, 2023, 154
[37] An Improved Multi-Modal based Machine Learning Approach for the Prognosis of Alzheimer?s disease
Khan, Afreen
Zubair, Swaleha
JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES, 2022, 34 (06) : 2688 - 2706
[38] A Novel Hybrid Multi-Modal Deep Learning for Detecting Hashtag Incongruity on Social Media
Dadgar, Sajad
Neshat, Mehdi
SENSORS, 2022, 22 (24)
[39] Multi-modal supply chain distribution problem
Kharodawala, Hussain A.
Mahajan, Ashutosh
Moorkanat, Jayan
OPSEARCH, 2022, 59 (03) : 747 - 768
[40] Hybrid Multi-Modal Deep Learning using Collaborative Concat Layer in Health Bigdata
Kim, Joo-Chang
Chung, Kyungyong
IEEE ACCESS, 2020, 8 : 192469 - 192480

← 1 2 3 4 5 →