DPNET: Dynamic Poly-attention Network for Trustworthy Multi-modal Classification

被引:3
|
作者
Zou, Xin [1 ]
Tang, Chang [1 ]
Zheng, Xiao [2 ]
Li, Zhenglai [1 ]
He, Xiao [1 ]
An, Shan [3 ]
Liu, Xinwang [2 ]
机构
[1] China Univ Geosci, Sch Comp Sci, Wuhan, Peoples R China
[2] Natl Univ Def Technol, Sch Comp, Changsha, Peoples R China
[3] JD Hlth Int Inc, Beijing, Peoples R China
来源
PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023 | 2023年
基金
国家重点研发计划; 美国国家科学基金会;
关键词
trustworthy multi-modal classification; confidence estimation; dynamical fusion; attention; cross-modal low-rank fusion; FUSION;
D O I
10.1145/3581783.3612652
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
With advances in sensing technology, multi-modal data collected from different sources are increasingly available. Multi-modal classification aims to integrate complementary information from multi-modal data to improve model classification performance. However, existing multi-modal classification methods are basically weak in integrating global structural information and providing trustworthy multi-modal fusion, especially in safety-sensitive practical applications (e.g., medical diagnosis). In this paper, we propose a novel Dynamic Poly-attention Network (DPNET) for trustworthy multi-modal classification. Specifically, DPNET has four merits: (i) To capture the intrinsic modality-specific structural information, we design a structure-aware feature aggregation module to learn the corresponding structure-preserved global compact feature representation. (ii) A transparent fusion strategy based on the modality confidence estimation strategy is induced to track information variation within different modalities for dynamical fusion. (iii) To facilitate more effective and efficient multi-modal fusion, we introduce a cross-modal low-rank fusion module to reduce the complexity of tensor-based fusion and activate the implication of different rank-wise features via a rank attention mechanism. (iv) Alabel confidence estimation module is devised to drive the network to generate more credible confidence. An intra-class attention loss is introduced to supervise the network training. Extensive experiments on four real-world multi-modal biomedical datasets demonstrate that the proposed method achieves competitive performance compared to other state-of-the-art ones.
引用
收藏
页码:3550 / 3559
页数:10
相关论文
共 50 条
  • [31] Fusion based on attention mechanism and context constraint for multi-modal brain tumor segmentation
    Zhou, Tongxue
    Canu, Stephane
    Ruan, Su
    COMPUTERIZED MEDICAL IMAGING AND GRAPHICS, 2020, 86
  • [32] Attention based Multi-Modal New Product Sales Time-series Forecasting
    Ekambaram, Vijay
    Manglik, Kushagra
    Mukherjee, Sumanta
    Sajja, Surya Shravan Kumar
    Dwivedi, Satyam
    Raykar, Vikas
    KDD '20: PROCEEDINGS OF THE 26TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2020, : 3110 - 3118
  • [33] Improved Multi-modal Image Fusion with Attention and Dense Networks: Visual and Quantitative Evaluation
    Banerjee, Ankan
    Patra, Dipti
    Roy, Pradipta
    COMPUTER VISION AND IMAGE PROCESSING, CVIP 2023, PT III, 2024, 2011 : 237 - 248
  • [34] GLOVE-ING ATTENTION: A MULTI-MODAL NEURAL LEARNING APPROACH TO IMAGE CAPTIONING
    Anundskas, Lars Halvor
    Afridi, Hina
    Tarekegn, Adane Nega
    Yamin, Muhammad Mudassar
    Ullah, Mohib
    Yamin, Saira
    Cheikh, Faouzi Alaya
    2023 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING WORKSHOPS, ICASSPW, 2023,
  • [35] Dynamic Tracking of State Anxiety via Multi-Modal Data and Machine Learning
    Ding, Yue
    Liu, Jingjing
    Zhang, Xiaochen
    Yang, Zhi
    FRONTIERS IN PSYCHIATRY, 2022, 13
  • [36] Dual-branch multi-modal convergence network for crater detection using Chang'e image
    Lin, Feng
    Hu, Xie
    Lin, Yiling
    Li, Yao
    Liu, Yang
    Li, Dongmei
    INTERNATIONAL JOURNAL OF APPLIED EARTH OBSERVATION AND GEOINFORMATION, 2024, 134
  • [37] PMMN: Pre-trained multi-Modal network for scene text recognition
    Zhang, Yu
    Fu, Zilong
    Huang, Fuyu
    Liu, Yizhi
    PATTERN RECOGNITION LETTERS, 2021, 151 : 103 - 111
  • [38] Corporate Relative Valuation Using Heterogeneous Multi-Modal Graph Neural Network
    Yang, Yang
    Yang, Jia-Qi
    Bao, Ran
    Zhan, De-Chuan
    Zhu, Hengshu
    Gao, Xiao-Ru
    Xiong, Hui
    Yang, Jian
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2023, 35 (01) : 211 - 224
  • [39] Classification and severity assessment of disaster losses based on multi-modal information in social media
    Zhou, Wei
    An, Lu
    Han, Ruilian
    Li, Gang
    INFORMATION PROCESSING & MANAGEMENT, 2025, 62 (05)
  • [40] Distributed Training and Inference of Deep Learning Models for Multi-Modal Land Cover Classification
    Aspri, Maria
    Tsagkatakis, Grigorios
    Tsakalides, Panagiotis
    REMOTE SENSING, 2020, 12 (17)