Multi-modal Adapter for Medical Vision-and-Language Learning

被引:1
|
作者
Yu, Zheng [1 ]
Qiao, Yanyuan [1 ]
Xie, Yutong [1 ]
Wu, Qi [1 ]
机构
[1] Univ Adelaide, Australian Inst Machine Learning, Adelaide, SA, Australia
来源
MACHINE LEARNING IN MEDICAL IMAGING, MLMI 2023, PT I | 2024年 / 14348卷
关键词
Medical Vision-and-Language Learning; Parameter-Efficient Transfer Learning; Multi-Modal Adapter; MODEL;
D O I
10.1007/978-3-031-45673-2_39
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recently, medical vision-and-language learning has attracted great attention from biomedical communities. Thanks to the development of large pre-trained models, the performances on these medical multi-modal learning benchmarks have been greatly improved. However, due to the rapid growth of the model size, full fine-tuning these large pre-trained models has become costly in training and storing such huge parameters for each downstream task. Thus, we propose a parameter-efficient transfer learning method named Medical Multi-Modal Adapter (M(3)AD) to mediate this problem. We select the state-of-the-art M(3)AE model as our baseline, which is pre-trained on 30k medical image-text pairs with multiple proxy tasks and has about 340M parameters. To be specific, we first insert general adapters after multi-head attention layers and feed-forward layers in all transformer blocks of M(3)AE. Then, we specifically design a modality-fusion adapter that adopts multi-head attention mechanisms and we insert them in the cross-modal encoder to enhance the multi-modal interactions. Compared to full fine-tuning, we freeze most parameters in M(3)AE and only train these inserted adapters with much smaller sizes. Extensive experimental results on three medical visual question answering datasets and one medical multi-modal classification dataset demonstrate the effectiveness of our proposed method, where M(3)AD achieves competitive performances compared to full fine-tuning with much fewer training parameters and memory consumption.
引用
收藏
页码:393 / 402
页数:10
相关论文
共 50 条
  • [21] Uncertainty-Aware Multi-modal Learning via Cross-Modal Random Network Prediction
    Wang, Hu
    Zhang, Jianpeng
    Chen, Yuanhong
    Ma, Congbo
    Avery, Jodie
    Hull, Louise
    Carneiro, Gustavo
    COMPUTER VISION, ECCV 2022, PT XXXVII, 2022, 13697 : 200 - 217
  • [22] Multi-Modal Perception for Selective Rendering
    Harvey, Carlo
    Debattista, Kurt
    Bashford-Rogers, Thomas
    Chalmers, Alan
    COMPUTER GRAPHICS FORUM, 2017, 36 (01) : 172 - 183
  • [23] Adversarial Multi-Task Learning for Mandarin Prosodic Boundary Prediction With Multi-Modal Embeddings
    Yi, Jiangyan
    Tao, Jianhua
    Fu, Ruibo
    Wang, Tao
    Zhang, Chu Yuan
    Wang, Chenglong
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2023, 31 : 2963 - 2973
  • [24] A Review of the Application of Multi-modal Deep Learning in Medicine: Bibliometrics and Future Directions
    Pei, Xiangdong
    Zuo, Ke
    Li, Yuan
    Pang, Zhengbin
    INTERNATIONAL JOURNAL OF COMPUTATIONAL INTELLIGENCE SYSTEMS, 2023, 16 (01)
  • [25] Unsupervised multi-modal representation learning for affective computing with multi-corpus wearable data
    Ross, Kyle
    Hungler, Paul
    Etemad, Ali
    JOURNAL OF AMBIENT INTELLIGENCE AND HUMANIZED COMPUTING, 2021, 14 (4) : 3199 - 3224
  • [26] Learning consumer preferences through textual and visual data: a multi-modal approach
    Liu, Xinyu
    Liu, Yezheng
    Qian, Yang
    Jiang, Yuanchun
    Ling, Haifeng
    ELECTRONIC COMMERCE RESEARCH, 2023,
  • [27] Multi-Modal Learning and Relaxation of Physical Conflict for an Exoskeleton Robot with Proprioceptive Perception
    Zhang, Xuan
    Shu, Yana
    Chen, Yu
    Chen, Gong
    Ye, Jing
    Li, Xiu
    Li, Xiang
    2023 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2023), 2023, : 10490 - 10496
  • [28] Multi-modal Fusion Brain Tumor Detection Method Based on Deep Learning
    Yao Hong-ge
    Shen Xin-xia
    Li Yu
    Yu Jun
    Lei Song-ze
    ACTA PHOTONICA SINICA, 2019, 48 (07)
  • [29] Multi-modal tumor segmentation methods based on deep learning: a narrative review
    Xue, Hengzhi
    Yao, Yudong
    Teng, Yueyang
    QUANTITATIVE IMAGING IN MEDICINE AND SURGERY, 2024, 14 (01) : 1122 - 1140
  • [30] Multi-Constraint Latent Representation Learning for Prognosis Analysis Using Multi-Modal Data
    Ning, Zhenyuan
    Lin, Zehui
    Xiao, Qing
    Du, Denghui
    Feng, Qianjin
    Chen, Wufan
    Zhang, Yu
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (07) : 3737 - 3750