SDDA: A progressive self-distillation with decoupled alignment for multimodal image–text classification

被引:0
|
作者
Chen, Xiaohao [1 ]
Shuai, Qianjun [1 ]
Hu, Feng [1 ]
Cheng, Yongqiang [2 ]
机构
[1] College of Information and Communication Engineering, Communication University of China, Beijing,100024, China
[2] Faculty of Technology, University of Sunderland, Sunderland,SR6 0DD, United Kingdom
关键词
Image classification;
D O I
10.1016/j.neucom.2024.128794
中图分类号
学科分类号
摘要
Multimodal image–text classification endeavors to deduce the correct category based on the information encapsulated in image–text pairs. Despite the commendable performance achieved by current image–text methodologies, the intrinsic multimodal heterogeneity persists as a challenge, with the contributions from diverse modalities exhibiting considerable variance. In this study, we address this issue by introducing a novel decoupled multimodal Self-Distillation (SDDA) approach, aimed at facilitating fine-grained alignment of shared and private features of image–text features in a low-dimensional space, thereby reducing information redundancy. Specifically, each modality representation is decoupled in an autoregressive manner into two segments within a modality-irrelevant/exclusive space. SDDA imparts additional knowledge transfer to each decoupled segment via self-distillation, while also offering flexible, richer multimodal knowledge supervision for unimodal features. Multimodal classification experiments conducted on two publicly available benchmark datasets verified the efficacy of the algorithm, demonstrating that SDDA surpasses the state-of-the-art baselines. © 2024 Elsevier B.V.
引用
收藏
相关论文
共 50 条
  • [21] Self-Distillation for Few-Shot Image Captioning
    Chen, Xianyu
    Jiang, Ming
    Zhao, Qi
    2021 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2021), 2021, : 545 - 555
  • [22] Sketch Classification and Sketch Based Image Retrieval Using ViT with Self-Distillation for Few Samples
    Kang, Sungjae
    Seo, Kisung
    JOURNAL OF ELECTRICAL ENGINEERING & TECHNOLOGY, 2024, 19 (07) : 4587 - 4593
  • [23] Self-Distillation Regularized Connectionist Temporal Classification Loss for Text Recognition: A Simple Yet Effective Approach
    Zhang, Ziyin
    Lu, Ning
    Liao, Minghui
    Huang, Yongshuai
    Li, Cheng
    Wang, Min
    Peng, Wei
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 7, 2024, : 7441 - 7449
  • [24] SPSD: Similarity-preserving self-distillation for video–text retrieval
    Jiachen Wang
    Yan Hua
    Yingyun Yang
    Hongwei Kou
    International Journal of Multimedia Information Retrieval, 2023, 12
  • [26] Robust Cross-Modal Representation Learning with Progressive Self-Distillation
    Andonian, Alex
    Chen, Shixing
    Hamid, Raffay
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 16409 - 16420
  • [27] Confidence Matters: Enhancing Medical Image Classification Through Uncertainty-Driven Contrastive Self-distillation
    Sharma, Saurabh
    Kumar, Atul
    Chandra, Joydeep
    MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2024, PT X, 2024, 15010 : 133 - 142
  • [28] StAlK: Structural Alignment based Self Knowledge distillation for Medical Image Classification
    Sharma, Saurabh
    Kumar, Atul
    Monpara, Jenish
    Chandra, Joydeep
    KNOWLEDGE-BASED SYSTEMS, 2024, 304
  • [29] A Transformer-Based Model With Self-Distillation for Multimodal Emotion Recognition in Conversations
    Ma, Hui
    Wang, Jian
    Lin, Hongfei
    Zhang, Bo
    Zhang, Yijia
    Xu, Bo
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 776 - 788
  • [30] A Unified Self-Distillation Framework for Multimodal Sentiment Analysis with Uncertain Missing Modalities
    Li, Mingcheng
    Yang, Dingkang
    Lei, Yuxuan
    Wang, Shunli
    Wang, Shuaibing
    Su, Liuzhen
    Yang, Kun
    Wang, Yuzheng
    Sun, Mingyang
    Zhang, Lihua
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 9, 2024, : 10074 - 10082