DPNET: Dynamic Poly-attention Network for Trustworthy Multi-modal Classification

被引：3

作者：

Zou, Xin ^{[1
]}

Tang, Chang ^{[1
]}

Zheng, Xiao ^{[2
]}

Li, Zhenglai ^{[1
]}

He, Xiao ^{[1
]}

An, Shan ^{[3
]}

Liu, Xinwang ^{[2
]}

机构：

[1] China Univ Geosci, Sch Comp Sci, Wuhan, Peoples R China

[2] Natl Univ Def Technol, Sch Comp, Changsha, Peoples R China

[3] JD Hlth Int Inc, Beijing, Peoples R China

来源：

PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023 | 2023年

基金：

国家重点研发计划; 美国国家科学基金会;

关键词：

trustworthy multi-modal classification; confidence estimation; dynamical fusion; attention; cross-modal low-rank fusion; FUSION;

D O I：

10.1145/3581783.3612652

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

With advances in sensing technology, multi-modal data collected from different sources are increasingly available. Multi-modal classification aims to integrate complementary information from multi-modal data to improve model classification performance. However, existing multi-modal classification methods are basically weak in integrating global structural information and providing trustworthy multi-modal fusion, especially in safety-sensitive practical applications (e.g., medical diagnosis). In this paper, we propose a novel Dynamic Poly-attention Network (DPNET) for trustworthy multi-modal classification. Specifically, DPNET has four merits: (i) To capture the intrinsic modality-specific structural information, we design a structure-aware feature aggregation module to learn the corresponding structure-preserved global compact feature representation. (ii) A transparent fusion strategy based on the modality confidence estimation strategy is induced to track information variation within different modalities for dynamical fusion. (iii) To facilitate more effective and efficient multi-modal fusion, we introduce a cross-modal low-rank fusion module to reduce the complexity of tensor-based fusion and activate the implication of different rank-wise features via a rank attention mechanism. (iv) Alabel confidence estimation module is devised to drive the network to generate more credible confidence. An intra-class attention loss is introduced to supervise the network training. Extensive experiments on four real-world multi-modal biomedical datasets demonstrate that the proposed method achieves competitive performance compared to other state-of-the-art ones.

引用

页码：3550 / 3559

页数：10

共 50 条

[31] Fusion based on attention mechanism and context constraint for multi-modal brain tumor segmentation
Zhou, Tongxue
Canu, Stephane
Ruan, Su
COMPUTERIZED MEDICAL IMAGING AND GRAPHICS, 2020, 86
[32] Attention based Multi-Modal New Product Sales Time-series Forecasting
Ekambaram, Vijay
Manglik, Kushagra
Mukherjee, Sumanta
Sajja, Surya Shravan Kumar
Dwivedi, Satyam
Raykar, Vikas
KDD '20: PROCEEDINGS OF THE 26TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2020, : 3110 - 3118
[33] Improved Multi-modal Image Fusion with Attention and Dense Networks: Visual and Quantitative Evaluation
Banerjee, Ankan
Patra, Dipti
Roy, Pradipta
COMPUTER VISION AND IMAGE PROCESSING, CVIP 2023, PT III, 2024, 2011 : 237 - 248
[34] GLOVE-ING ATTENTION: A MULTI-MODAL NEURAL LEARNING APPROACH TO IMAGE CAPTIONING
Anundskas, Lars Halvor
Afridi, Hina
Tarekegn, Adane Nega
Yamin, Muhammad Mudassar
Ullah, Mohib
Yamin, Saira
Cheikh, Faouzi Alaya
2023 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING WORKSHOPS, ICASSPW, 2023,
[35] Dynamic Tracking of State Anxiety via Multi-Modal Data and Machine Learning
Ding, Yue
Liu, Jingjing
Zhang, Xiaochen
Yang, Zhi
FRONTIERS IN PSYCHIATRY, 2022, 13
[36] Dual-branch multi-modal convergence network for crater detection using Chang'e image
Lin, Feng
Hu, Xie
Lin, Yiling
Li, Yao
Liu, Yang
Li, Dongmei
INTERNATIONAL JOURNAL OF APPLIED EARTH OBSERVATION AND GEOINFORMATION, 2024, 134
[37] PMMN: Pre-trained multi-Modal network for scene text recognition
Zhang, Yu
Fu, Zilong
Huang, Fuyu
Liu, Yizhi
PATTERN RECOGNITION LETTERS, 2021, 151 : 103 - 111
[38] Corporate Relative Valuation Using Heterogeneous Multi-Modal Graph Neural Network
Yang, Yang
Yang, Jia-Qi
Bao, Ran
Zhan, De-Chuan
Zhu, Hengshu
Gao, Xiao-Ru
Xiong, Hui
Yang, Jian
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2023, 35 (01) : 211 - 224
[39] Classification and severity assessment of disaster losses based on multi-modal information in social media
Zhou, Wei
An, Lu
Han, Ruilian
Li, Gang
INFORMATION PROCESSING & MANAGEMENT, 2025, 62 (05)
[40] Distributed Training and Inference of Deep Learning Models for Multi-Modal Land Cover Classification
Aspri, Maria
Tsagkatakis, Grigorios
Tsakalides, Panagiotis
REMOTE SENSING, 2020, 12 (17)

← 1 2 3 4 5 →