A unified multimodal classification framework based on deep metric learning
被引:0
|
作者:
Peng, Liwen
论文数: 0引用数: 0
h-index: 0
机构:
Intelligent Game & Decis Lab, Beijing 100080, Peoples R China
Natl Univ Def Technol, Coll Comp, Changsha 410073, Hunan, Peoples R ChinaIntelligent Game & Decis Lab, Beijing 100080, Peoples R China
Peng, Liwen
[1
,2
]
Jian, Songlei
论文数: 0引用数: 0
h-index: 0
机构:
Natl Univ Def Technol, Coll Comp, Changsha 410073, Hunan, Peoples R ChinaIntelligent Game & Decis Lab, Beijing 100080, Peoples R China
Jian, Songlei
[2
]
Li, Minne
论文数: 0引用数: 0
h-index: 0
机构:
Intelligent Game & Decis Lab, Beijing 100080, Peoples R ChinaIntelligent Game & Decis Lab, Beijing 100080, Peoples R China
Li, Minne
[1
]
Kan, Zhigang
论文数: 0引用数: 0
h-index: 0
机构:
Intelligent Game & Decis Lab, Beijing 100080, Peoples R ChinaIntelligent Game & Decis Lab, Beijing 100080, Peoples R China
Kan, Zhigang
[1
]
Qiao, Linbo
论文数: 0引用数: 0
h-index: 0
机构:
Natl Univ Def Technol, Coll Comp, Changsha 410073, Hunan, Peoples R ChinaIntelligent Game & Decis Lab, Beijing 100080, Peoples R China
Qiao, Linbo
[2
]
Li, Dongsheng
论文数: 0引用数: 0
h-index: 0
机构:
Natl Univ Def Technol, Coll Comp, Changsha 410073, Hunan, Peoples R ChinaIntelligent Game & Decis Lab, Beijing 100080, Peoples R China
Li, Dongsheng
[2
]
机构:
[1] Intelligent Game & Decis Lab, Beijing 100080, Peoples R China
[2] Natl Univ Def Technol, Coll Comp, Changsha 410073, Hunan, Peoples R China
Multimodal classification algorithms play an essential role in multimodal machine learning, aiming to categorize distinct data points by analyzing data characteristics from multiple modalities. Extensive research has been conducted on distilling multimodal attributes and devising specialized fusion strategies for targeted classification tasks. Nevertheless, current algorithms mainly concentrate on a specific classification task and process data about the corresponding modalities. To address these limitations, we propose a unified multimodal classification framework proficient in handling diverse multimodal classification tasks and processing data from disparate modalities. UMCF is task-independent, and its unimodal feature extraction module can be adaptively substituted to accommodate data from diverse modalities. Moreover, we construct the multimodal learning scheme based on deep metric learning to mine latent characteristics within multimodal data. Specifically, we design the metric-based triplet learning to extract the intra-modal relationships within each modality and the contrastive pairwise learning to capture the inter-modal relationships across various modalities. Extensive experiments on two multimodal classification tasks, fake news detection and sentiment analysis, demonstrate that UMCF can extract multimodal data features and achieve superior classification performance than task- specific benchmarks. UMCF outperforms the best fake news detection baselines by 2.3% on average regarding F1 scores.