Multi-modal Adapter for Medical Vision-and-Language Learning

被引:1
|
作者
Yu, Zheng [1 ]
Qiao, Yanyuan [1 ]
Xie, Yutong [1 ]
Wu, Qi [1 ]
机构
[1] Univ Adelaide, Australian Inst Machine Learning, Adelaide, SA, Australia
来源
MACHINE LEARNING IN MEDICAL IMAGING, MLMI 2023, PT I | 2024年 / 14348卷
关键词
Medical Vision-and-Language Learning; Parameter-Efficient Transfer Learning; Multi-Modal Adapter; MODEL;
D O I
10.1007/978-3-031-45673-2_39
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recently, medical vision-and-language learning has attracted great attention from biomedical communities. Thanks to the development of large pre-trained models, the performances on these medical multi-modal learning benchmarks have been greatly improved. However, due to the rapid growth of the model size, full fine-tuning these large pre-trained models has become costly in training and storing such huge parameters for each downstream task. Thus, we propose a parameter-efficient transfer learning method named Medical Multi-Modal Adapter (M(3)AD) to mediate this problem. We select the state-of-the-art M(3)AE model as our baseline, which is pre-trained on 30k medical image-text pairs with multiple proxy tasks and has about 340M parameters. To be specific, we first insert general adapters after multi-head attention layers and feed-forward layers in all transformer blocks of M(3)AE. Then, we specifically design a modality-fusion adapter that adopts multi-head attention mechanisms and we insert them in the cross-modal encoder to enhance the multi-modal interactions. Compared to full fine-tuning, we freeze most parameters in M(3)AE and only train these inserted adapters with much smaller sizes. Extensive experimental results on three medical visual question answering datasets and one medical multi-modal classification dataset demonstrate the effectiveness of our proposed method, where M(3)AD achieves competitive performances compared to full fine-tuning with much fewer training parameters and memory consumption.
引用
收藏
页码:393 / 402
页数:10
相关论文
共 50 条
  • [31] A knowledge and data-driven optimal planning scheme for multi-modal vision transmission systems
    Yong, Jia
    Liu, Kai
    MATHEMATICAL BIOSCIENCES AND ENGINEERING, 2023, 20 (07) : 11939 - 11956
  • [32] Pixel-level structure awareness for enhancing multi-modal medical image fusion
    Wei, Lisi
    Zhu, Rui
    Li, Xiongfei
    Zhao, Libo
    Hu, Xiaohan
    Zhang, Xiaoli
    BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2024, 97
  • [33] Multi-modal IoT-based medical data processing for disease diagnosis using Heuristic-derived deep learning
    Kayalvizhi, S.
    Nagarajan, S.
    Deepa, J.
    Hemapriya, K.
    BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2023, 85
  • [34] Multi-modal activity recognition from egocentric vision, semantic enrichment and lifelogging applications for the care of dementia
    Meditskos, Georgios
    Plans, Pierre-Marie
    Stavropoulos, Thanos G.
    Benois-Pineau, Jenny
    Buso, Vincent
    Kompatsiaris, Ioannis
    JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2018, 51 : 169 - 190
  • [35] Dynamically engineered multi-modal feature learning for predictions of office building cooling loads
    Liu, Yiren
    Zhao, Xiangyu
    Qin, S. Joe
    APPLIED ENERGY, 2024, 355
  • [36] Decentralized signal control for multi-modal traffic network: A deep reinforcement learning approach
    Yu, Jiajie
    Laharotte, Pierre-Antoine
    Han, Yu
    Leclercq, Ludovic
    TRANSPORTATION RESEARCH PART C-EMERGING TECHNOLOGIES, 2023, 154
  • [37] An Improved Multi-Modal based Machine Learning Approach for the Prognosis of Alzheimer?s disease
    Khan, Afreen
    Zubair, Swaleha
    JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES, 2022, 34 (06) : 2688 - 2706
  • [38] A Novel Hybrid Multi-Modal Deep Learning for Detecting Hashtag Incongruity on Social Media
    Dadgar, Sajad
    Neshat, Mehdi
    SENSORS, 2022, 22 (24)
  • [39] Multi-modal supply chain distribution problem
    Kharodawala, Hussain A.
    Mahajan, Ashutosh
    Moorkanat, Jayan
    OPSEARCH, 2022, 59 (03) : 747 - 768
  • [40] Hybrid Multi-Modal Deep Learning using Collaborative Concat Layer in Health Bigdata
    Kim, Joo-Chang
    Chung, Kyungyong
    IEEE ACCESS, 2020, 8 : 192469 - 192480