SSMIM: Semi-supervised Multimodal Interaction Model for real-time conversational emotion recognitionSSMIM: semi-supervised multimodal...H. Liu, K. Zhong

被引：0

作者：

Hanyu Liu ^{[1
]}

Kezhen Zhong ^{[2
]}

机构：

[1] Beijing University of Technology,College of Computer Science

[2] Harbin Institute of Technology,Faculty of Computing

来源：

The Journal of Supercomputing | / 81卷 / 8期

关键词：

Conversational emotion recognition; Semi-supervised learning; Multimodal interaction; Intra-modal interaction; Text-centered cross-modal fusion;

D O I：

10.1007/s11227-025-07373-w

中图分类号：

学科分类号：

摘要：

Emotion recognition in conversations (ERC) plays a crucial role in human–computer interaction and affective computing. However, existing ERC methods face several challenges, including the lack of sufficient data annotations and the difficulties in integrating multimodal information effectively. To overcome these challenges, we propose SSMIM, a novel semi-supervised multimodal emotion recognition framework. This framework enhances emotional feature representation through a primary-modality-guided strategy that combines both intra-modality representations and cross-modality interactions. Additionally, SSMIM employs a context modeling approach that utilizes directed acyclic graph and bidirectional gated recurrent unit to capture contextual dependencies in dialogues from both multimodal and primary-modality perspectives, thereby improving emotion classification accuracy. Moreover, to address the challenges of dynamic data and limited annotations in real-time scenarios, SSMIM integrates an online learning mechanism. This mechanism leverages pseudo-label generation and self-training to tackle the issue of insufficient labeled data and allows the model to adapt to real-time changes in dialogue contexts. Experimental results on IEMOCAP, MELD, and CMU-MOSEI show that SSMIM outperforms existing methods, achieving state-of-the-art performance.

引用