DPNET: Dynamic Poly-attention Network for Trustworthy Multi-modal Classification

被引:3
|
作者
Zou, Xin [1 ]
Tang, Chang [1 ]
Zheng, Xiao [2 ]
Li, Zhenglai [1 ]
He, Xiao [1 ]
An, Shan [3 ]
Liu, Xinwang [2 ]
机构
[1] China Univ Geosci, Sch Comp Sci, Wuhan, Peoples R China
[2] Natl Univ Def Technol, Sch Comp, Changsha, Peoples R China
[3] JD Hlth Int Inc, Beijing, Peoples R China
来源
PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023 | 2023年
基金
国家重点研发计划; 美国国家科学基金会;
关键词
trustworthy multi-modal classification; confidence estimation; dynamical fusion; attention; cross-modal low-rank fusion; FUSION;
D O I
10.1145/3581783.3612652
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
With advances in sensing technology, multi-modal data collected from different sources are increasingly available. Multi-modal classification aims to integrate complementary information from multi-modal data to improve model classification performance. However, existing multi-modal classification methods are basically weak in integrating global structural information and providing trustworthy multi-modal fusion, especially in safety-sensitive practical applications (e.g., medical diagnosis). In this paper, we propose a novel Dynamic Poly-attention Network (DPNET) for trustworthy multi-modal classification. Specifically, DPNET has four merits: (i) To capture the intrinsic modality-specific structural information, we design a structure-aware feature aggregation module to learn the corresponding structure-preserved global compact feature representation. (ii) A transparent fusion strategy based on the modality confidence estimation strategy is induced to track information variation within different modalities for dynamical fusion. (iii) To facilitate more effective and efficient multi-modal fusion, we introduce a cross-modal low-rank fusion module to reduce the complexity of tensor-based fusion and activate the implication of different rank-wise features via a rank attention mechanism. (iv) Alabel confidence estimation module is devised to drive the network to generate more credible confidence. An intra-class attention loss is introduced to supervise the network training. Extensive experiments on four real-world multi-modal biomedical datasets demonstrate that the proposed method achieves competitive performance compared to other state-of-the-art ones.
引用
收藏
页码:3550 / 3559
页数:10
相关论文
共 50 条
  • [41] Sound-Based Terrain Classification for Multi-modal Wheel-Leg Robots
    Xue, Feng
    Hu, Longteng
    Yao, Chen
    Liu, Zhengtao
    Zhu, Zheng
    Jia, Zhenzhong
    2022 INTERNATIONAL CONFERENCE ON ADVANCED ROBOTICS AND MECHATRONICS (ICARM 2022), 2022, : 174 - 179
  • [42] Multi-modal Emotion Recognition with Temporal-Band Attention Based on LSTM-RNN
    Liu, Jiamin
    Su, Yuanqi
    Liu, Yuehu
    ADVANCES IN MULTIMEDIA INFORMATION PROCESSING - PCM 2017, PT I, 2018, 10735 : 194 - 204
  • [43] Attention-based Multi-modal Sentiment Analysis and Emotion Detection in Conversation using RNN
    Huddar, Mahesh G.
    Sannakki, Sanjeev S.
    Rajpurohit, Vijay S.
    INTERNATIONAL JOURNAL OF INTERACTIVE MULTIMEDIA AND ARTIFICIAL INTELLIGENCE, 2021, 6 (06): : 112 - 121
  • [44] Generating Multi-Codebook Neural Network by Using Intelligent Gaussian Mixture Model Clustering Based on Histogram Information for Multi-Modal Data Classification
    Ma'Sum, M. Anwar
    Alfiany, Noverina
    Jatmiko, Wisnu
    IEEE ACCESS, 2024, 12 : 189449 - 189476
  • [45] Cascade fusion of multi-modal and multi-source feature fusion by the attention for three-dimensional object detection
    Yu, Fengning
    Lian, Jing
    Li, Linhui
    Zhao, Jian
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2024, 133
  • [46] Multi-level graph regularized robust multi-modal feature selection for Alzheimer's disease classification
    Zhang, Chao
    Fan, Wentao
    Li, Huaxiong
    Chen, Chunlin
    KNOWLEDGE-BASED SYSTEMS, 2024, 293
  • [47] SMIN: Semi-Supervised Multi-Modal Interaction Network for Conversational Emotion Recognition
    Lian, Zheng
    Liu, Bin
    Tao, Jianhua
    IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2023, 14 (03) : 2415 - 2429
  • [48] Self-Supervised Multi-Modal Hybrid Fusion Network for Brain Tumor Segmentation
    Fang, Feiyi
    Yao, Yazhou
    Zhou, Tao
    Xie, Guosen
    Lu, Jianfeng
    IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2022, 26 (11) : 5310 - 5320
  • [49] DC-NAS: Divide-and-Conquer Neural Architecture Search for Multi-Modal Classification
    Liang, Xinyan
    Fu, Pinhan
    Guo, Qian
    Zheng, Keyin
    Qian, Yuhua
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 12, 2024, : 13754 - 13762
  • [50] MIPANet: optimizing RGB-D semantic segmentation through multi-modal interaction and pooling attention
    Zhang, Shuai
    Xie, Minghong
    FRONTIERS IN PHYSICS, 2024, 12