DPNET: Dynamic Poly-attention Network for Trustworthy Multi-modal Classification

被引：3

作者：

Zou, Xin ^{[1
]}

Tang, Chang ^{[1
]}

Zheng, Xiao ^{[2
]}

Li, Zhenglai ^{[1
]}

He, Xiao ^{[1
]}

An, Shan ^{[3
]}

Liu, Xinwang ^{[2
]}

机构：

[1] China Univ Geosci, Sch Comp Sci, Wuhan, Peoples R China

[2] Natl Univ Def Technol, Sch Comp, Changsha, Peoples R China

[3] JD Hlth Int Inc, Beijing, Peoples R China

来源：

PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023 | 2023年

基金：

美国国家科学基金会; 国家重点研发计划;

关键词：

trustworthy multi-modal classification; confidence estimation; dynamical fusion; attention; cross-modal low-rank fusion; FUSION;

D O I：

10.1145/3581783.3612652

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

With advances in sensing technology, multi-modal data collected from different sources are increasingly available. Multi-modal classification aims to integrate complementary information from multi-modal data to improve model classification performance. However, existing multi-modal classification methods are basically weak in integrating global structural information and providing trustworthy multi-modal fusion, especially in safety-sensitive practical applications (e.g., medical diagnosis). In this paper, we propose a novel Dynamic Poly-attention Network (DPNET) for trustworthy multi-modal classification. Specifically, DPNET has four merits: (i) To capture the intrinsic modality-specific structural information, we design a structure-aware feature aggregation module to learn the corresponding structure-preserved global compact feature representation. (ii) A transparent fusion strategy based on the modality confidence estimation strategy is induced to track information variation within different modalities for dynamical fusion. (iii) To facilitate more effective and efficient multi-modal fusion, we introduce a cross-modal low-rank fusion module to reduce the complexity of tensor-based fusion and activate the implication of different rank-wise features via a rank attention mechanism. (iv) Alabel confidence estimation module is devised to drive the network to generate more credible confidence. An intra-class attention loss is introduced to supervise the network training. Extensive experiments on four real-world multi-modal biomedical datasets demonstrate that the proposed method achieves competitive performance compared to other state-of-the-art ones.

引用

页码：3550 / 3559

页数：10

共 50 条

[1] Multi-Modal Retinal Image Classification With Modality-Specific Attention Network
He, Xingxin
Deng, Ying
Fang, Leyuan
Peng, Qinghua
IEEE TRANSACTIONS ON MEDICAL IMAGING, 2021, 40 (06) : 1591 - 1602
[2] Non-Uniform Attention Network for Multi-modal Sentiment Analysis
Wang, Binqiang
Dong, Gang
Zhao, Yaqian
Li, Rengang
Cao, Qichun
Chao, Yinyin
MULTIMEDIA MODELING (MMM 2022), PT I, 2022, 13141 : 612 - 623
[3] Multi-Modal Knowledge-Aware Attention Network for Question Answering
Zhang Y.
Qian S.
Fang Q.
Xu C.
Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2020, 57 (05): : 1037 - 1045
[4] Multi-modal cross-attention network for Alzheimer's disease diagnosis with multi data
Zhang, Jin
He, Xiaohai
Liu, Yan
Cai, Qingyan
Chen, Honggang
Qing, Linbo
COMPUTERS IN BIOLOGY AND MEDICINE, 2023, 162
[5] Intelligent Clustering and Dynamic Incremental Learning to Generate Multi-Codebook Fuzzy Neural Network for Multi-Modal Data Classification
Ma'sum, Muhammad Anwar
SYMMETRY-BASEL, 2020, 12 (04):
[6] The multi-modal fusion in visual question answering: a review of attention mechanisms
Lu, Siyu
Liu, Mingzhe
Yin, Lirong
Yin, Zhengtong
Liu, Xuan
Zheng, Wenfeng
PEERJ COMPUTER SCIENCE, 2023, 9
[7] Multi-modal Experts Network for Autonomous Driving
Fang, Shihong
Choromanska, Anna
2020 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), 2020, : 6439 - 6445
[8] Object Interaction Recommendation with Multi-Modal Attention-based Hierarchical Graph Neural Network
Zhang, Huijuan
Liang, Lipeng
Wang, Dongqing
2021 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2021, : 295 - 305
[9] Multi-Stage Fusion and Multi-Source Attention Network for Multi-Modal Remote Sensing Image Segmentation
Zhao, Jiaqi
Zhou, Yong
Shi, Boyu
Yang, Jingsong
Zhang, Di
Yao, Rui
ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2021, 12 (06)
[10] Cross-modal dynamic convolution for multi-modal emotion recognition
Wen, Huanglu
You, Shaodi
Fu, Ying
JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2021, 78

← 1 2 3 4 5 →