DPNET: Dynamic Poly-attention Network for Trustworthy Multi-modal Classification

被引:3
|
作者
Zou, Xin [1 ]
Tang, Chang [1 ]
Zheng, Xiao [2 ]
Li, Zhenglai [1 ]
He, Xiao [1 ]
An, Shan [3 ]
Liu, Xinwang [2 ]
机构
[1] China Univ Geosci, Sch Comp Sci, Wuhan, Peoples R China
[2] Natl Univ Def Technol, Sch Comp, Changsha, Peoples R China
[3] JD Hlth Int Inc, Beijing, Peoples R China
来源
PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023 | 2023年
基金
国家重点研发计划; 美国国家科学基金会;
关键词
trustworthy multi-modal classification; confidence estimation; dynamical fusion; attention; cross-modal low-rank fusion; FUSION;
D O I
10.1145/3581783.3612652
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
With advances in sensing technology, multi-modal data collected from different sources are increasingly available. Multi-modal classification aims to integrate complementary information from multi-modal data to improve model classification performance. However, existing multi-modal classification methods are basically weak in integrating global structural information and providing trustworthy multi-modal fusion, especially in safety-sensitive practical applications (e.g., medical diagnosis). In this paper, we propose a novel Dynamic Poly-attention Network (DPNET) for trustworthy multi-modal classification. Specifically, DPNET has four merits: (i) To capture the intrinsic modality-specific structural information, we design a structure-aware feature aggregation module to learn the corresponding structure-preserved global compact feature representation. (ii) A transparent fusion strategy based on the modality confidence estimation strategy is induced to track information variation within different modalities for dynamical fusion. (iii) To facilitate more effective and efficient multi-modal fusion, we introduce a cross-modal low-rank fusion module to reduce the complexity of tensor-based fusion and activate the implication of different rank-wise features via a rank attention mechanism. (iv) Alabel confidence estimation module is devised to drive the network to generate more credible confidence. An intra-class attention loss is introduced to supervise the network training. Extensive experiments on four real-world multi-modal biomedical datasets demonstrate that the proposed method achieves competitive performance compared to other state-of-the-art ones.
引用
收藏
页码:3550 / 3559
页数:10
相关论文
共 50 条
  • [21] An attention based multi-modal gender identification system for social media users
    Chanchal Suman
    Rohit Shyamkant Chaudhary
    Sriparna Saha
    Pushpak Bhattacharyya
    Multimedia Tools and Applications, 2022, 81 : 27033 - 27055
  • [22] An attention based multi-modal gender identification system for social media users
    Suman, Chanchal
    Chaudhary, Rohit Shyamkant
    Saha, Sriparna
    Bhattacharyya, Pushpak
    MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (19) : 27033 - 27055
  • [23] Multi-modal Quality Prediction Algorithm Based on Anomalous Energy Tracking Attention
    Li, Haoyong
    Zhang, Qifei
    Li, Wenjuan
    Liang, Xiubo
    ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT II, ICIC 2024, 2024, 14876 : 150 - 162
  • [24] Multi-modal interactive attention and dual progressive decoding network for RGB-D/T salient object detection
    Liang, Yanhua
    Qin, Guihe
    Sun, Minghui
    Qin, Jun
    Yan, Jie
    Zhang, Zhonghan
    NEUROCOMPUTING, 2022, 490 : 132 - 145
  • [25] A scalable multi-modal learning fruit detection algorithm for dynamic environments
    Mao, Liang
    Guo, Zihao
    Liu, Mingzhe
    Li, Yue
    Wang, Linlin
    Li, Jie
    FRONTIERS IN NEUROROBOTICS, 2025, 18
  • [26] Multi-Modal Convolutional Parameterisation Network for Guided Image Inverse Problems
    Czerkawski, Mikolaj
    Upadhyay, Priti
    Davison, Christopher
    Atkinson, Robert
    Michie, Craig
    Andonovic, Ivan
    Macdonald, Malcolm
    Cardona, Javier
    Tachtatzis, Christos
    JOURNAL OF IMAGING, 2024, 10 (03)
  • [27] Pedestrian detection network with multi-modal cross-guided learning
    Hua, ChunJian
    Sun, MingChun
    Zhu, Yu
    Jiang, Yi
    Yu, JianFeng
    Chen, Ying
    DIGITAL SIGNAL PROCESSING, 2022, 122
  • [28] Efficient and Effective Multi-Modal Queries Through Heterogeneous Network Embedding
    Chi Thang Duong
    Thanh Tam Nguyen
    Yin, Hongzhi
    Weidlich, Matthias
    Mai, Thai Son
    Aberer, Karl
    Quoc Viet Hung Nguyen
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2022, 34 (11) : 5307 - 5320
  • [29] AutoAMS: Automated attention-based multi-modal graph learning architecture search
    Al-Sabri, Raeed
    Gao, Jianliang
    Chen, Jiamin
    Oloulade, Babatounde Moctard
    Wu, Zhenpeng
    NEURAL NETWORKS, 2024, 179
  • [30] Air Quality Prediction with 1-Dimensional Convolution and Attention on Multi-modal Features
    Choi, Junyoung
    Kim, Joonyoung
    Jung, Kyomin
    2021 IEEE INTERNATIONAL CONFERENCE ON BIG DATA AND SMART COMPUTING (BIGCOMP 2021), 2021, : 196 - 202