DPNET: Dynamic Poly-attention Network for Trustworthy Multi-modal Classification

被引：3

作者：

Zou, Xin ^{[1
]}

Tang, Chang ^{[1
]}

Zheng, Xiao ^{[2
]}

Li, Zhenglai ^{[1
]}

He, Xiao ^{[1
]}

An, Shan ^{[3
]}

Liu, Xinwang ^{[2
]}

机构：

[1] China Univ Geosci, Sch Comp Sci, Wuhan, Peoples R China

[2] Natl Univ Def Technol, Sch Comp, Changsha, Peoples R China

[3] JD Hlth Int Inc, Beijing, Peoples R China

来源：

PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023 | 2023年

基金：

国家重点研发计划; 美国国家科学基金会;

关键词：

trustworthy multi-modal classification; confidence estimation; dynamical fusion; attention; cross-modal low-rank fusion; FUSION;

D O I：

10.1145/3581783.3612652

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

With advances in sensing technology, multi-modal data collected from different sources are increasingly available. Multi-modal classification aims to integrate complementary information from multi-modal data to improve model classification performance. However, existing multi-modal classification methods are basically weak in integrating global structural information and providing trustworthy multi-modal fusion, especially in safety-sensitive practical applications (e.g., medical diagnosis). In this paper, we propose a novel Dynamic Poly-attention Network (DPNET) for trustworthy multi-modal classification. Specifically, DPNET has four merits: (i) To capture the intrinsic modality-specific structural information, we design a structure-aware feature aggregation module to learn the corresponding structure-preserved global compact feature representation. (ii) A transparent fusion strategy based on the modality confidence estimation strategy is induced to track information variation within different modalities for dynamical fusion. (iii) To facilitate more effective and efficient multi-modal fusion, we introduce a cross-modal low-rank fusion module to reduce the complexity of tensor-based fusion and activate the implication of different rank-wise features via a rank attention mechanism. (iv) Alabel confidence estimation module is devised to drive the network to generate more credible confidence. An intra-class attention loss is introduced to supervise the network training. Extensive experiments on four real-world multi-modal biomedical datasets demonstrate that the proposed method achieves competitive performance compared to other state-of-the-art ones.

引用

页码：3550 / 3559

页数：10

共 50 条

[41] Sound-Based Terrain Classification for Multi-modal Wheel-Leg Robots
Xue, Feng
Hu, Longteng
Yao, Chen
Liu, Zhengtao
Zhu, Zheng
Jia, Zhenzhong
2022 INTERNATIONAL CONFERENCE ON ADVANCED ROBOTICS AND MECHATRONICS (ICARM 2022), 2022, : 174 - 179
[42] Multi-modal Emotion Recognition with Temporal-Band Attention Based on LSTM-RNN
Liu, Jiamin
Su, Yuanqi
Liu, Yuehu
ADVANCES IN MULTIMEDIA INFORMATION PROCESSING - PCM 2017, PT I, 2018, 10735 : 194 - 204
[43] Attention-based Multi-modal Sentiment Analysis and Emotion Detection in Conversation using RNN
Huddar, Mahesh G.
Sannakki, Sanjeev S.
Rajpurohit, Vijay S.
INTERNATIONAL JOURNAL OF INTERACTIVE MULTIMEDIA AND ARTIFICIAL INTELLIGENCE, 2021, 6 (06): : 112 - 121
[44] Generating Multi-Codebook Neural Network by Using Intelligent Gaussian Mixture Model Clustering Based on Histogram Information for Multi-Modal Data Classification
Ma'Sum, M. Anwar
Alfiany, Noverina
Jatmiko, Wisnu
IEEE ACCESS, 2024, 12 : 189449 - 189476
[45] Cascade fusion of multi-modal and multi-source feature fusion by the attention for three-dimensional object detection
Yu, Fengning
Lian, Jing
Li, Linhui
Zhao, Jian
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2024, 133
[46] Multi-level graph regularized robust multi-modal feature selection for Alzheimer's disease classification
Zhang, Chao
Fan, Wentao
Li, Huaxiong
Chen, Chunlin
KNOWLEDGE-BASED SYSTEMS, 2024, 293
[47] SMIN: Semi-Supervised Multi-Modal Interaction Network for Conversational Emotion Recognition
Lian, Zheng
Liu, Bin
Tao, Jianhua
IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2023, 14 (03) : 2415 - 2429
[48] Self-Supervised Multi-Modal Hybrid Fusion Network for Brain Tumor Segmentation
Fang, Feiyi
Yao, Yazhou
Zhou, Tao
Xie, Guosen
Lu, Jianfeng
IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2022, 26 (11) : 5310 - 5320
[49] DC-NAS: Divide-and-Conquer Neural Architecture Search for Multi-Modal Classification
Liang, Xinyan
Fu, Pinhan
Guo, Qian
Zheng, Keyin
Qian, Yuhua
THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 12, 2024, : 13754 - 13762
[50] MIPANet: optimizing RGB-D semantic segmentation through multi-modal interaction and pooling attention
Zhang, Shuai
Xie, Minghong
FRONTIERS IN PHYSICS, 2024, 12

← 1 2 3 4 5 →