Audio-Guided Fusion Techniques for Multimodal Emotion Analysis

被引:0
作者
Shi, Pujin [1 ]
Gao, Fei [1 ]
机构
[1] Beijing Univ Posts & Telecommun, Sch Cyberspace Secur, State Key Lab Networking & Switching Technol, Beijing, Peoples R China
来源
PROCEEDINGS OF THE 2ND INTERNATIONAL WORKSHOP ON MULTIMODAL AND RESPONSIBLE AFFECTIVE COMPUTING, MRAC 2024 | 2024年
关键词
Multimodal emotion recognition; Multimodal feature fusion; Self-supervised learning; RECOGNITION; SLEEP;
D O I
10.1145/3689092.3689414
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we propose a solution for the semi-supervised learning track (MER-SEMI) in MER2024. First, in order to enhance the performance of the feature extractor on sentiment classification tasks, we fine-tuned video and text feature extractors, specifically CLIP-vit-large and Baichuan-13B, using labeled data. This approach effectively preserves the original emotional information conveyed in the videos. Second, we propose an Audio-Guided Transformer (AGT) fusion mechanism, which leverages the robustness of Hubert-large, showing superior effectiveness in fusing both inter-channel and intra-channel information. Third, To enhance the accuracy of the model, we iteratively apply self-supervised learning by using high-confidence unlabeled data as pseudo-labels. Finally, through black-box probing, we discovered an imbalanced data distribution between the training and test sets. Therefore, We adopt a prior-knowledge-based voting mechanism. The results demonstrate the effectiveness of our strategy, ultimately earning us third place in the MER-SEMI track.
引用
收藏
页码:62 / 66
页数:5
相关论文
共 36 条
[1]  
Abdullah S. M. S. A., 2021, Journal of Applied Science and Technology Trends, V2, P73
[2]   Pseudo-Labeling and Confirmation Bias in Deep Semi-Supervised Learning [J].
Arazo, Eric ;
Ortego, Diego ;
Albert, Paul ;
O'Connor, Noel E. ;
McGuinness, Kevin .
2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
[3]   Measuring emotions in education using wearable devices: A systematic review [J].
Ba, Shen ;
Hu, Xiao .
COMPUTERS & EDUCATION, 2023, 200
[4]   Semi-Supervised Multimodal Emotion Recognition with Class-Balanced Pseudo-Labeling [J].
Chen, Haifeng ;
Guo, Chujia ;
Li, Yan ;
Zhang, Peng ;
Jiang, Dongmei .
PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, :9556-9560
[5]   Sleep in Children and Adolescents with Behavioral and Emotional Disorders [J].
Dahl, Ronald E. ;
Harvey, Allison G. .
SLEEP MEDICINE CLINICS, 2007, 2 (03) :501-+
[6]  
Ding Chaoyue, 2023, P 1 INT WORKSH MULT
[7]   Rethinking Pseudo-Labeling for Semi-Supervised Facial Expression Recognition With Contrastive Self-Supervised Learning [J].
Fang, Bei ;
Li, Xian ;
Han, Guangxin ;
He, Juhou .
IEEE ACCESS, 2023, 11 :45547-45558
[8]  
FEINBERG TE, 1986, ARCH GEN PSYCHIAT, V43, P276
[9]   Masked Autoencoders Are Scalable Vision Learners [J].
He, Kaiming ;
Chen, Xinlei ;
Xie, Saining ;
Li, Yanghao ;
Dollar, Piotr ;
Girshick, Ross .
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, :15979-15988
[10]  
He Y., 2022, P 3 INT MULT SENT AN, P61, DOI [10.1145/3551876.3554811, DOI 10.1145/3551876.3554811]