Multi-modal Adapter for Medical Vision-and-Language Learning

被引:1
|
作者
Yu, Zheng [1 ]
Qiao, Yanyuan [1 ]
Xie, Yutong [1 ]
Wu, Qi [1 ]
机构
[1] Univ Adelaide, Australian Inst Machine Learning, Adelaide, SA, Australia
来源
MACHINE LEARNING IN MEDICAL IMAGING, MLMI 2023, PT I | 2024年 / 14348卷
关键词
Medical Vision-and-Language Learning; Parameter-Efficient Transfer Learning; Multi-Modal Adapter; MODEL;
D O I
10.1007/978-3-031-45673-2_39
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recently, medical vision-and-language learning has attracted great attention from biomedical communities. Thanks to the development of large pre-trained models, the performances on these medical multi-modal learning benchmarks have been greatly improved. However, due to the rapid growth of the model size, full fine-tuning these large pre-trained models has become costly in training and storing such huge parameters for each downstream task. Thus, we propose a parameter-efficient transfer learning method named Medical Multi-Modal Adapter (M(3)AD) to mediate this problem. We select the state-of-the-art M(3)AE model as our baseline, which is pre-trained on 30k medical image-text pairs with multiple proxy tasks and has about 340M parameters. To be specific, we first insert general adapters after multi-head attention layers and feed-forward layers in all transformer blocks of M(3)AE. Then, we specifically design a modality-fusion adapter that adopts multi-head attention mechanisms and we insert them in the cross-modal encoder to enhance the multi-modal interactions. Compared to full fine-tuning, we freeze most parameters in M(3)AE and only train these inserted adapters with much smaller sizes. Extensive experimental results on three medical visual question answering datasets and one medical multi-modal classification dataset demonstrate the effectiveness of our proposed method, where M(3)AD achieves competitive performances compared to full fine-tuning with much fewer training parameters and memory consumption.
引用
收藏
页码:393 / 402
页数:10
相关论文
共 50 条
  • [41] Multi-modal cascade detection of pipeline defects based on deep transfer metric learning
    Gao, Boxuan
    Zhao, Hong
    Miao, Xingyuan
    ENGINEERING FAILURE ANALYSIS, 2024, 160
  • [42] Multi-Modal Deep Learning Diagnosis of Parkinson's Disease-A Systematic Review
    Skaramagkas, Vasileios
    Pentari, Anastasia
    Kefalopoulou, Zinovia
    Tsiknakis, Manolis
    IEEE TRANSACTIONS ON NEURAL SYSTEMS AND REHABILITATION ENGINEERING, 2023, 31 : 2399 - 2423
  • [43] Recognizing multi-modal sensor signals using evolutionary learning of dynamic Bayesian networks
    Lee, Young-Seol
    Cho, Sung-Bae
    PATTERN ANALYSIS AND APPLICATIONS, 2014, 17 (04) : 695 - 707
  • [44] Characterizing habit learning in the human brain at the individual and group levels: A multi-modal MRI study
    Gera, Rani
    Bar Or, Maya
    Tavor, Ido
    Roll, Dana
    Cockburn, Jeffrey
    Barak, Segev
    Tricomi, Elizabeth
    O'Doherty, John P.
    Schonberg, Tom
    NEUROIMAGE, 2023, 272
  • [45] Contextual and Cross-Modal Interaction for Multi-Modal Speech Emotion Recognition
    Yang, Dingkang
    Huang, Shuai
    Liu, Yang
    Zhang, Lihua
    IEEE SIGNAL PROCESSING LETTERS, 2022, 29 : 2093 - 2097
  • [46] A Conversational Agent Framework with Multi-modal Personality Expression
    Sonlu, Sinan
    Gudukbay, Ugur
    Durupinar, Funda
    ACM TRANSACTIONS ON GRAPHICS, 2021, 40 (01):
  • [47] Multi-Modal Sensor Medical Image Fusion Based on Multiple Salient Features With Guided Image Filter
    Li, Weisheng
    Jia, Linghui
    Du, Jiao
    IEEE ACCESS, 2019, 7 : 173019 - 173033
  • [48] Managing distributed trust relationships for multi-modal authentication
    Van Hamme, Tim
    Preuveneers, Davy
    Joosen, Wouter
    JOURNAL OF INFORMATION SECURITY AND APPLICATIONS, 2018, 40 : 258 - 270
  • [49] Network-Clustered Multi-Modal Bug Localization
    Thong Hoang
    Oentaryo, Richard J.
    Le, Tien-Duy B.
    Lo, David
    IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2019, 45 (10) : 1002 - 1023
  • [50] Semantic Alignment Network for Multi-Modal Emotion Recognition
    Hou, Mixiao
    Zhang, Zheng
    Liu, Chang
    Lu, Guangming
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (09) : 5318 - 5329