A unified multimodal classification framework based on deep metric learning

被引：0

作者：

Peng, Liwen ^{[1
,2
]}

Jian, Songlei ^{[2
]}

Li, Minne ^{[1
]}

Kan, Zhigang ^{[1
]}

Qiao, Linbo ^{[2
]}

Li, Dongsheng ^{[2
]}

机构：

[1] Intelligent Game & Decis Lab, Beijing 100080, Peoples R China

[2] Natl Univ Def Technol, Coll Comp, Changsha 410073, Hunan, Peoples R China

来源：

NEURAL NETWORKS | 2025年 / 181卷

基金：

中国国家自然科学基金;

关键词：

Multimodal classification; Deep metric learning; Multimodal learning; Fake news detection; Sentiment analysis; FUSION;

D O I：

10.1016/j.neunet.2024.106747

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Multimodal classification algorithms play an essential role in multimodal machine learning, aiming to categorize distinct data points by analyzing data characteristics from multiple modalities. Extensive research has been conducted on distilling multimodal attributes and devising specialized fusion strategies for targeted classification tasks. Nevertheless, current algorithms mainly concentrate on a specific classification task and process data about the corresponding modalities. To address these limitations, we propose a unified multimodal classification framework proficient in handling diverse multimodal classification tasks and processing data from disparate modalities. UMCF is task-independent, and its unimodal feature extraction module can be adaptively substituted to accommodate data from diverse modalities. Moreover, we construct the multimodal learning scheme based on deep metric learning to mine latent characteristics within multimodal data. Specifically, we design the metric-based triplet learning to extract the intra-modal relationships within each modality and the contrastive pairwise learning to capture the inter-modal relationships across various modalities. Extensive experiments on two multimodal classification tasks, fake news detection and sentiment analysis, demonstrate that UMCF can extract multimodal data features and achieve superior classification performance than task- specific benchmarks. UMCF outperforms the best fake news detection baselines by 2.3% on average regarding F1 scores.

引用

页数：12

共 50 条

[41] Study of Deep Metric Learning on Character Classification
Yen, Po-Hsuan
Tseng, Chien-Cheng
Lee, Su-Ling
Hong, Zong-Zheng
2020 IEEE INTERNATIONAL CONFERENCE ON CONSUMER ELECTRONICS - TAIWAN (ICCE-TAIWAN), 2020,
[42] Deep Metric Learning for Histopathological Image Classification
Calderaro, Salvatore
Lo Bosco, Giosue
Rizzo, Riccardo
Vella, Filippo
2022 IEEE EIGHTH INTERNATIONAL CONFERENCE ON MULTIMEDIA BIG DATA (BIGMM 2022), 2022, : 57 - 64
[43] A deep semantic framework for multimodal representation learning
Wang, Cheng
Yang, Haojin
Meinel, Christoph
MULTIMEDIA TOOLS AND APPLICATIONS, 2016, 75 (15) : 9255 - 9276
[44] A deep semantic framework for multimodal representation learning
Cheng Wang
Haojin Yang
Christoph Meinel
Multimedia Tools and Applications, 2016, 75 : 9255 - 9276
[45] Speech Intention Classification with Multimodal Deep Learning
Gu, Yue
Li, Xinyu
Chen, Shuhong
Zhang, Jianyu
Marsic, Ivan
ADVANCES IN ARTIFICIAL INTELLIGENCE, CANADIAN AI 2017, 2017, 10233 : 260 - 271
[46] Multimodal deep representation learning for video classification
Haiman Tian
Yudong Tao
Samira Pouyanfar
Shu-Ching Chen
Mei-Ling Shyu
World Wide Web, 2019, 22 : 1325 - 1341
[47] A Unified Framework for Domain Adaptation Using Metric Learning on Manifolds
Mahadevan, Sridhar
Mishra, Bamdev
Ghosh, Shalini
MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2018, PT II, 2019, 11052 : 843 - 860
[48] Unified framework of subspace and distance metric learning for face recognition
Liu, Qingshan
Metaxas, Dimitris N.
ANALYSIS AND MODELING OF FACES AND GESTURES, PROCEEDINGS, 2007, 4778 : 250 - 260
[49] Multimodal deep representation learning for video classification
Tian, Haiman
Tao, Yudong
Pouyanfar, Samira
Chen, Shu-Ching
Shyu, Mei-Ling
WORLD WIDE WEB-INTERNET AND WEB INFORMATION SYSTEMS, 2019, 22 (03): : 1325 - 1341
[50] A Deep Learning based CNN framework approach for Plankton Classification
Rawat, Sarthak Singh
Bisht, Abhishek
Nijhawan, Rahul
2019 FIFTH INTERNATIONAL CONFERENCE ON IMAGE INFORMATION PROCESSING (ICIIP 2019), 2019, : 268 - 273

← 1 2 3 4 5 →