DPNET: Dynamic Poly-attention Network for Trustworthy Multi-modal Classification

被引：3

作者：

Zou, Xin ^{[1
]}

Tang, Chang ^{[1
]}

Zheng, Xiao ^{[2
]}

Li, Zhenglai ^{[1
]}

He, Xiao ^{[1
]}

An, Shan ^{[3
]}

Liu, Xinwang ^{[2
]}

机构：

[1] China Univ Geosci, Sch Comp Sci, Wuhan, Peoples R China

[2] Natl Univ Def Technol, Sch Comp, Changsha, Peoples R China

[3] JD Hlth Int Inc, Beijing, Peoples R China

来源：

PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023 | 2023年

基金：

国家重点研发计划; 美国国家科学基金会;

关键词：

trustworthy multi-modal classification; confidence estimation; dynamical fusion; attention; cross-modal low-rank fusion; FUSION;

D O I：

10.1145/3581783.3612652

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

With advances in sensing technology, multi-modal data collected from different sources are increasingly available. Multi-modal classification aims to integrate complementary information from multi-modal data to improve model classification performance. However, existing multi-modal classification methods are basically weak in integrating global structural information and providing trustworthy multi-modal fusion, especially in safety-sensitive practical applications (e.g., medical diagnosis). In this paper, we propose a novel Dynamic Poly-attention Network (DPNET) for trustworthy multi-modal classification. Specifically, DPNET has four merits: (i) To capture the intrinsic modality-specific structural information, we design a structure-aware feature aggregation module to learn the corresponding structure-preserved global compact feature representation. (ii) A transparent fusion strategy based on the modality confidence estimation strategy is induced to track information variation within different modalities for dynamical fusion. (iii) To facilitate more effective and efficient multi-modal fusion, we introduce a cross-modal low-rank fusion module to reduce the complexity of tensor-based fusion and activate the implication of different rank-wise features via a rank attention mechanism. (iv) Alabel confidence estimation module is devised to drive the network to generate more credible confidence. An intra-class attention loss is introduced to supervise the network training. Extensive experiments on four real-world multi-modal biomedical datasets demonstrate that the proposed method achieves competitive performance compared to other state-of-the-art ones.

引用

页码：3550 / 3559

页数：10

共 50 条

[21] An attention based multi-modal gender identification system for social media users
Chanchal Suman
Rohit Shyamkant Chaudhary
Sriparna Saha
Pushpak Bhattacharyya
Multimedia Tools and Applications, 2022, 81 : 27033 - 27055
[22] An attention based multi-modal gender identification system for social media users
Suman, Chanchal
Chaudhary, Rohit Shyamkant
Saha, Sriparna
Bhattacharyya, Pushpak
MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (19) : 27033 - 27055
[23] Multi-modal Quality Prediction Algorithm Based on Anomalous Energy Tracking Attention
Li, Haoyong
Zhang, Qifei
Li, Wenjuan
Liang, Xiubo
ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT II, ICIC 2024, 2024, 14876 : 150 - 162
[24] Multi-modal interactive attention and dual progressive decoding network for RGB-D/T salient object detection
Liang, Yanhua
Qin, Guihe
Sun, Minghui
Qin, Jun
Yan, Jie
Zhang, Zhonghan
NEUROCOMPUTING, 2022, 490 : 132 - 145
[25] A scalable multi-modal learning fruit detection algorithm for dynamic environments
Mao, Liang
Guo, Zihao
Liu, Mingzhe
Li, Yue
Wang, Linlin
Li, Jie
FRONTIERS IN NEUROROBOTICS, 2025, 18
[26] Multi-Modal Convolutional Parameterisation Network for Guided Image Inverse Problems
Czerkawski, Mikolaj
Upadhyay, Priti
Davison, Christopher
Atkinson, Robert
Michie, Craig
Andonovic, Ivan
Macdonald, Malcolm
Cardona, Javier
Tachtatzis, Christos
JOURNAL OF IMAGING, 2024, 10 (03)
[27] Pedestrian detection network with multi-modal cross-guided learning
Hua, ChunJian
Sun, MingChun
Zhu, Yu
Jiang, Yi
Yu, JianFeng
Chen, Ying
DIGITAL SIGNAL PROCESSING, 2022, 122
[28] Efficient and Effective Multi-Modal Queries Through Heterogeneous Network Embedding
Chi Thang Duong
Thanh Tam Nguyen
Yin, Hongzhi
Weidlich, Matthias
Mai, Thai Son
Aberer, Karl
Quoc Viet Hung Nguyen
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2022, 34 (11) : 5307 - 5320
[29] AutoAMS: Automated attention-based multi-modal graph learning architecture search
Al-Sabri, Raeed
Gao, Jianliang
Chen, Jiamin
Oloulade, Babatounde Moctard
Wu, Zhenpeng
NEURAL NETWORKS, 2024, 179
[30] Air Quality Prediction with 1-Dimensional Convolution and Attention on Multi-modal Features
Choi, Junyoung
Kim, Joonyoung
Jung, Kyomin
2021 IEEE INTERNATIONAL CONFERENCE ON BIG DATA AND SMART COMPUTING (BIGCOMP 2021), 2021, : 196 - 202

← 1 2 3 4 5 →