Leveraging Contrastive Learning and Self-Training for Multimodal Emotion Recognition with Limited Labeled Samples

被引：1

作者：

Fan, Qi ^{[1
]}

Li, Yutong ^{[2
]}

Xin, Yi ^{[3
]}

Cheng, Xinyu ^{[3
]}

Gao, Guanglai ^{[1
]}

Ma, Miao ^{[2
]}

机构：

[1] Inner Mongolia Univ, Hohhot, Peoples R China

[2] Shaanxi Normal Univ, Xian, Peoples R China

[3] Nanjing Univ, Nanjing, Peoples R China

来源：

PROCEEDINGS OF THE 2ND INTERNATIONAL WORKSHOP ON MULTIMODAL AND RESPONSIBLE AFFECTIVE COMPUTING, MRAC 2024 | 2024年

基金：

中国国家自然科学基金;

关键词：

Multimodal Emotion Recognition; Semi-Supervised Learning; Contrastive Learning; Multi-Classifier Voting;

D O I：

10.1145/3689092.3689412

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The Multimodal Emotion Recognition challenge MER2024 focuses on recognizing emotions using audio, language, and visual signals. In this paper, we present our submission solutions for the Semi-Supervised Learning Sub-Challenge (MER2024-SEMI), which tackles the issue of limited annotated data in emotion recognition. Firstly, to address the class imbalance, we adopt an oversampling strategy. Secondly, we propose a modality representation combinatorial contrastive learning (MR-CCL) framework on the trimodal input data to establish robust initial models. Thirdly, we explore a self-training approach to expand the training set. Finally, we enhance prediction robustness through a multi-classifier weighted soft voting strategy. Our proposed method is validated to be effective on the MER2024-SEMI Challenge, achieving a weighted average F-score of 88.25% and ranking 6th on the leaderboard. Our project is available at https://github.com/WooyoohL/MER2024-SEMI.

引用

页码：72 / 77

页数：6