DPNET: Dynamic Poly-attention Network for Trustworthy Multi-modal Classification

被引:3
|
作者
Zou, Xin [1 ]
Tang, Chang [1 ]
Zheng, Xiao [2 ]
Li, Zhenglai [1 ]
He, Xiao [1 ]
An, Shan [3 ]
Liu, Xinwang [2 ]
机构
[1] China Univ Geosci, Sch Comp Sci, Wuhan, Peoples R China
[2] Natl Univ Def Technol, Sch Comp, Changsha, Peoples R China
[3] JD Hlth Int Inc, Beijing, Peoples R China
来源
PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023 | 2023年
基金
美国国家科学基金会; 国家重点研发计划;
关键词
trustworthy multi-modal classification; confidence estimation; dynamical fusion; attention; cross-modal low-rank fusion; FUSION;
D O I
10.1145/3581783.3612652
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
With advances in sensing technology, multi-modal data collected from different sources are increasingly available. Multi-modal classification aims to integrate complementary information from multi-modal data to improve model classification performance. However, existing multi-modal classification methods are basically weak in integrating global structural information and providing trustworthy multi-modal fusion, especially in safety-sensitive practical applications (e.g., medical diagnosis). In this paper, we propose a novel Dynamic Poly-attention Network (DPNET) for trustworthy multi-modal classification. Specifically, DPNET has four merits: (i) To capture the intrinsic modality-specific structural information, we design a structure-aware feature aggregation module to learn the corresponding structure-preserved global compact feature representation. (ii) A transparent fusion strategy based on the modality confidence estimation strategy is induced to track information variation within different modalities for dynamical fusion. (iii) To facilitate more effective and efficient multi-modal fusion, we introduce a cross-modal low-rank fusion module to reduce the complexity of tensor-based fusion and activate the implication of different rank-wise features via a rank attention mechanism. (iv) Alabel confidence estimation module is devised to drive the network to generate more credible confidence. An intra-class attention loss is introduced to supervise the network training. Extensive experiments on four real-world multi-modal biomedical datasets demonstrate that the proposed method achieves competitive performance compared to other state-of-the-art ones.
引用
收藏
页码:3550 / 3559
页数:10
相关论文
共 50 条
  • [1] Multi-Modal Retinal Image Classification With Modality-Specific Attention Network
    He, Xingxin
    Deng, Ying
    Fang, Leyuan
    Peng, Qinghua
    IEEE TRANSACTIONS ON MEDICAL IMAGING, 2021, 40 (06) : 1591 - 1602
  • [2] Non-Uniform Attention Network for Multi-modal Sentiment Analysis
    Wang, Binqiang
    Dong, Gang
    Zhao, Yaqian
    Li, Rengang
    Cao, Qichun
    Chao, Yinyin
    MULTIMEDIA MODELING (MMM 2022), PT I, 2022, 13141 : 612 - 623
  • [3] Multi-Modal Knowledge-Aware Attention Network for Question Answering
    Zhang Y.
    Qian S.
    Fang Q.
    Xu C.
    Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2020, 57 (05): : 1037 - 1045
  • [4] Multi-modal cross-attention network for Alzheimer's disease diagnosis with multi data
    Zhang, Jin
    He, Xiaohai
    Liu, Yan
    Cai, Qingyan
    Chen, Honggang
    Qing, Linbo
    COMPUTERS IN BIOLOGY AND MEDICINE, 2023, 162
  • [5] Intelligent Clustering and Dynamic Incremental Learning to Generate Multi-Codebook Fuzzy Neural Network for Multi-Modal Data Classification
    Ma'sum, Muhammad Anwar
    SYMMETRY-BASEL, 2020, 12 (04):
  • [6] The multi-modal fusion in visual question answering: a review of attention mechanisms
    Lu, Siyu
    Liu, Mingzhe
    Yin, Lirong
    Yin, Zhengtong
    Liu, Xuan
    Zheng, Wenfeng
    PEERJ COMPUTER SCIENCE, 2023, 9
  • [7] Multi-modal Experts Network for Autonomous Driving
    Fang, Shihong
    Choromanska, Anna
    2020 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), 2020, : 6439 - 6445
  • [8] Object Interaction Recommendation with Multi-Modal Attention-based Hierarchical Graph Neural Network
    Zhang, Huijuan
    Liang, Lipeng
    Wang, Dongqing
    2021 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2021, : 295 - 305
  • [9] Multi-Stage Fusion and Multi-Source Attention Network for Multi-Modal Remote Sensing Image Segmentation
    Zhao, Jiaqi
    Zhou, Yong
    Shi, Boyu
    Yang, Jingsong
    Zhang, Di
    Yao, Rui
    ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2021, 12 (06)
  • [10] Cross-modal dynamic convolution for multi-modal emotion recognition
    Wen, Huanglu
    You, Shaodi
    Fu, Ying
    JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2021, 78