Audio-Guided Fusion Techniques for Multimodal Emotion Analysis

被引：0

作者：

Shi, Pujin ^{[1
]}

Gao, Fei ^{[1
]}

机构：

[1] Beijing Univ Posts & Telecommun, Sch Cyberspace Secur, State Key Lab Networking & Switching Technol, Beijing, Peoples R China

来源：

PROCEEDINGS OF THE 2ND INTERNATIONAL WORKSHOP ON MULTIMODAL AND RESPONSIBLE AFFECTIVE COMPUTING, MRAC 2024 | 2024年

关键词：

Multimodal emotion recognition; Multimodal feature fusion; Self-supervised learning; RECOGNITION; SLEEP;

D O I：

10.1145/3689092.3689414

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In this paper, we propose a solution for the semi-supervised learning track (MER-SEMI) in MER2024. First, in order to enhance the performance of the feature extractor on sentiment classification tasks, we fine-tuned video and text feature extractors, specifically CLIP-vit-large and Baichuan-13B, using labeled data. This approach effectively preserves the original emotional information conveyed in the videos. Second, we propose an Audio-Guided Transformer (AGT) fusion mechanism, which leverages the robustness of Hubert-large, showing superior effectiveness in fusing both inter-channel and intra-channel information. Third, To enhance the accuracy of the model, we iteratively apply self-supervised learning by using high-confidence unlabeled data as pseudo-labels. Finally, through black-box probing, we discovered an imbalanced data distribution between the training and test sets. Therefore, We adopt a prior-knowledge-based voting mechanism. The results demonstrate the effectiveness of our strategy, ultimately earning us third place in the MER-SEMI track.

引用

页码：62 / 66

页数：5

共 36 条

[1]

Abdullah S. M. S. A., 2021, Journal of Applied Science and Technology Trends, V2, P73

[2] Pseudo-Labeling and Confirmation Bias in Deep Semi-Supervised Learning [J].

Arazo, Eric ;

Ortego, Diego ;

Albert, Paul ;

O'Connor, Noel E. ;

McGuinness, Kevin .

2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,

[3] Measuring emotions in education using wearable devices: A systematic review [J].

Ba, Shen ;

Hu, Xiao .

COMPUTERS & EDUCATION, 2023, 200

[4] Semi-Supervised Multimodal Emotion Recognition with Class-Balanced Pseudo-Labeling [J].

Chen, Haifeng ;

Guo, Chujia ;

Li, Yan ;

Zhang, Peng ;

Jiang, Dongmei .

PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, :9556-9560

[5] Sleep in Children and Adolescents with Behavioral and Emotional Disorders [J].

Dahl, Ronald E. ;

Harvey, Allison G. .

SLEEP MEDICINE CLINICS, 2007, 2 (03) :501-+

[6]

Ding Chaoyue, 2023, P 1 INT WORKSH MULT

[7] Rethinking Pseudo-Labeling for Semi-Supervised Facial Expression Recognition With Contrastive Self-Supervised Learning [J].

Fang, Bei ;

Li, Xian ;

Han, Guangxin ;

He, Juhou .

IEEE ACCESS, 2023, 11 :45547-45558

[8]

FEINBERG TE, 1986, ARCH GEN PSYCHIAT, V43, P276

[9] Masked Autoencoders Are Scalable Vision Learners [J].

He, Kaiming ;

Chen, Xinlei ;

Xie, Saining ;

Li, Yanghao ;

Dollar, Piotr ;

Girshick, Ross .

2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, :15979-15988

[10]

He Y., 2022, P 3 INT MULT SENT AN, P61, DOI [10.1145/3551876.3554811, DOI 10.1145/3551876.3554811]

← 1 2 3 4 →