A unified multimodal classification framework based on deep metric learning

被引:0
|
作者
Peng, Liwen [1 ,2 ]
Jian, Songlei [2 ]
Li, Minne [1 ]
Kan, Zhigang [1 ]
Qiao, Linbo [2 ]
Li, Dongsheng [2 ]
机构
[1] Intelligent Game & Decis Lab, Beijing 100080, Peoples R China
[2] Natl Univ Def Technol, Coll Comp, Changsha 410073, Hunan, Peoples R China
基金
中国国家自然科学基金;
关键词
Multimodal classification; Deep metric learning; Multimodal learning; Fake news detection; Sentiment analysis; FUSION;
D O I
10.1016/j.neunet.2024.106747
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Multimodal classification algorithms play an essential role in multimodal machine learning, aiming to categorize distinct data points by analyzing data characteristics from multiple modalities. Extensive research has been conducted on distilling multimodal attributes and devising specialized fusion strategies for targeted classification tasks. Nevertheless, current algorithms mainly concentrate on a specific classification task and process data about the corresponding modalities. To address these limitations, we propose a unified multimodal classification framework proficient in handling diverse multimodal classification tasks and processing data from disparate modalities. UMCF is task-independent, and its unimodal feature extraction module can be adaptively substituted to accommodate data from diverse modalities. Moreover, we construct the multimodal learning scheme based on deep metric learning to mine latent characteristics within multimodal data. Specifically, we design the metric-based triplet learning to extract the intra-modal relationships within each modality and the contrastive pairwise learning to capture the inter-modal relationships across various modalities. Extensive experiments on two multimodal classification tasks, fake news detection and sentiment analysis, demonstrate that UMCF can extract multimodal data features and achieve superior classification performance than task- specific benchmarks. UMCF outperforms the best fake news detection baselines by 2.3% on average regarding F1 scores.
引用
收藏
页数:12
相关论文
共 50 条
  • [41] Study of Deep Metric Learning on Character Classification
    Yen, Po-Hsuan
    Tseng, Chien-Cheng
    Lee, Su-Ling
    Hong, Zong-Zheng
    2020 IEEE INTERNATIONAL CONFERENCE ON CONSUMER ELECTRONICS - TAIWAN (ICCE-TAIWAN), 2020,
  • [42] Deep Metric Learning for Histopathological Image Classification
    Calderaro, Salvatore
    Lo Bosco, Giosue
    Rizzo, Riccardo
    Vella, Filippo
    2022 IEEE EIGHTH INTERNATIONAL CONFERENCE ON MULTIMEDIA BIG DATA (BIGMM 2022), 2022, : 57 - 64
  • [43] A deep semantic framework for multimodal representation learning
    Wang, Cheng
    Yang, Haojin
    Meinel, Christoph
    MULTIMEDIA TOOLS AND APPLICATIONS, 2016, 75 (15) : 9255 - 9276
  • [44] A deep semantic framework for multimodal representation learning
    Cheng Wang
    Haojin Yang
    Christoph Meinel
    Multimedia Tools and Applications, 2016, 75 : 9255 - 9276
  • [45] Speech Intention Classification with Multimodal Deep Learning
    Gu, Yue
    Li, Xinyu
    Chen, Shuhong
    Zhang, Jianyu
    Marsic, Ivan
    ADVANCES IN ARTIFICIAL INTELLIGENCE, CANADIAN AI 2017, 2017, 10233 : 260 - 271
  • [46] Multimodal deep representation learning for video classification
    Haiman Tian
    Yudong Tao
    Samira Pouyanfar
    Shu-Ching Chen
    Mei-Ling Shyu
    World Wide Web, 2019, 22 : 1325 - 1341
  • [47] A Unified Framework for Domain Adaptation Using Metric Learning on Manifolds
    Mahadevan, Sridhar
    Mishra, Bamdev
    Ghosh, Shalini
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2018, PT II, 2019, 11052 : 843 - 860
  • [48] Unified framework of subspace and distance metric learning for face recognition
    Liu, Qingshan
    Metaxas, Dimitris N.
    ANALYSIS AND MODELING OF FACES AND GESTURES, PROCEEDINGS, 2007, 4778 : 250 - 260
  • [49] Multimodal deep representation learning for video classification
    Tian, Haiman
    Tao, Yudong
    Pouyanfar, Samira
    Chen, Shu-Ching
    Shyu, Mei-Ling
    WORLD WIDE WEB-INTERNET AND WEB INFORMATION SYSTEMS, 2019, 22 (03): : 1325 - 1341
  • [50] A Deep Learning based CNN framework approach for Plankton Classification
    Rawat, Sarthak Singh
    Bisht, Abhishek
    Nijhawan, Rahul
    2019 FIFTH INTERNATIONAL CONFERENCE ON IMAGE INFORMATION PROCESSING (ICIIP 2019), 2019, : 268 - 273