A unified multimodal classification framework based on deep metric learning

被引：0

作者：

Peng, Liwen ^{[1
,2
]}

Jian, Songlei ^{[2
]}

Li, Minne ^{[1
]}

Kan, Zhigang ^{[1
]}

Qiao, Linbo ^{[2
]}

Li, Dongsheng ^{[2
]}

机构：

[1] Intelligent Game & Decis Lab, Beijing 100080, Peoples R China

[2] Natl Univ Def Technol, Coll Comp, Changsha 410073, Hunan, Peoples R China

来源：

NEURAL NETWORKS | 2025年 / 181卷

基金：

中国国家自然科学基金;

关键词：

Multimodal classification; Deep metric learning; Multimodal learning; Fake news detection; Sentiment analysis; FUSION;

D O I：

10.1016/j.neunet.2024.106747

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Multimodal classification algorithms play an essential role in multimodal machine learning, aiming to categorize distinct data points by analyzing data characteristics from multiple modalities. Extensive research has been conducted on distilling multimodal attributes and devising specialized fusion strategies for targeted classification tasks. Nevertheless, current algorithms mainly concentrate on a specific classification task and process data about the corresponding modalities. To address these limitations, we propose a unified multimodal classification framework proficient in handling diverse multimodal classification tasks and processing data from disparate modalities. UMCF is task-independent, and its unimodal feature extraction module can be adaptively substituted to accommodate data from diverse modalities. Moreover, we construct the multimodal learning scheme based on deep metric learning to mine latent characteristics within multimodal data. Specifically, we design the metric-based triplet learning to extract the intra-modal relationships within each modality and the contrastive pairwise learning to capture the inter-modal relationships across various modalities. Extensive experiments on two multimodal classification tasks, fake news detection and sentiment analysis, demonstrate that UMCF can extract multimodal data features and achieve superior classification performance than task- specific benchmarks. UMCF outperforms the best fake news detection baselines by 2.3% on average regarding F1 scores.

引用

页数：12

共 50 条

[31] Multimodal attention-based deep learning for automatic modulation classification
Han, Jia
Yu, Zhiyong
Yang, Jian
FRONTIERS IN ENERGY RESEARCH, 2023, 10
[32] A Social Network Image Classification Algorithm Based on Multimodal Deep Learning
Bai, J. W.
Chi, C.
INTERNATIONAL JOURNAL OF COMPUTERS COMMUNICATIONS & CONTROL, 2020, 15 (06) : 1 - 12
[33] Deep-Learning-Based Multimodal Emotion Classification for Music Videos
Pandeya, Yagya Raj
Bhattarai, Bhuwan
Lee, Joonwhoan
SENSORS, 2021, 21 (14)
[34] Multimodal archive resources organization based on deep learning: a prospective framework
Zhou, Yaolin
Zhang, Zhaoyang
Wang, Xiaoyu
Sheng, Quanzheng
Zhao, Rongying
ASLIB JOURNAL OF INFORMATION MANAGEMENT, 2024,
[35] A Multimodal Classification Architecture for the Severity Diagnosis of Glaucoma Based on Deep Learning
Yi, Sanli
Zhang, Gang
Qian, Chaoxu
Lu, YunQing
Zhong, Hua
He, Jianfeng
FRONTIERS IN NEUROSCIENCE, 2022, 16
[36] Deep Metric Learning for Cervical Image Classification
Pal, Anabik
Xue, Zhiyun
Befano, Brian
Rodriguez, Ana Cecilia
Long, L. Rodney
Schiffman, Mark
Antani, Sameer
IEEE ACCESS, 2021, 9 : 53266 - 53275
[37] Research on Online Review Information Classification Based on Multimodal Deep Learning
Liu, Jingnan
Sun, Yefang
Zhang, Yueyi
Lu, Chenyuan
APPLIED SCIENCES-BASEL, 2024, 14 (09):
[38] Hyperspectral imagery classification with deep metric learning
Cao, Xianghai
Ge, Yiming
Li, Renjie
Zhao, Jing
Jiao, Licheng
NEUROCOMPUTING, 2019, 356 : 217 - 227
[39] Aurora Image Classification with Deep Metric Learning
Endo, Takeru
Matsumoto, Mitsuharu
SENSORS, 2022, 22 (17)
[40] Deep metric learning for otitis media classification
Sundgaard, Josefine Vilsboll
Harte, James
Bray, Peter
Laugesen, Soren
Kamide, Yosuke
Tanaka, Chiemi
Paulsen, Rasmus R.
Christensen, Anders Nymark
MEDICAL IMAGE ANALYSIS, 2021, 71

← 1 2 3 4 5 →