ESERNet: Learning spectrogram structure relationship for effective speech emotion recognition with swin transformer in classroom discourse analysis

被引:2
|
作者
Liu, Tingting [1 ,2 ]
Wang, Minghong [1 ]
Yang, Bing [2 ]
Liu, Hai [2 ,3 ]
Yi, Shaoxin [3 ]
机构
[1] Univ Hong Kong, Fac Educ, Hong Kong 999077, Peoples R China
[2] Hubei Univ, Sch Educ, Wuhan 430062, Peoples R China
[3] Cent China Normal Univ, Natl Engn Res Ctr E Learning, Wuhan 430079, Peoples R China
关键词
Speech emotion recognition; Intelligent education; Feature extraction; Swin Transformer; Classroom discourse analysis;
D O I
10.1016/j.neucom.2024.128711
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Speech emotion recognition (SER) has received increased attention due to its extensive applications in many fields, especially in the analysis of teacher-student dialogue in classroom environment. It can help teachers to better learn about students' emotions and thereby adjust teaching activities. However, SER has faced several challenges, such as the intrinsic ambiguity of emotions and the complex task of interpreting emotions from speech in noisy environments. These issues can result in reduced recognition accuracy due to a focus on less relevant or insignificant features. To address these challenges, this paper presents ESERNet, a Transformer-based model designed to effectively extract crucial clues from speech data by capturing both pivotal cues and longrange relationships in speech signal. The major contribution of our approach is a two-pathway SER framework. By leveraging the Transformer architecture, ESERNet captures long-range dependencies within speech mel-spectrograms, enabling a refined understanding of the emotional cues embedded in speech signals. Extensive experiments were conducted on the IEMOCAP and EmoDB datasets, the results show that ESERNet achieves state-of-the-art performance in SER and outperforms existing methods by effectively leveraging critical clues and capturing long-range dependencies in speech data. These results highlight the effectiveness of the model in addressing the complex challenges associated with SER tasks.
引用
收藏
页数:12
相关论文
共 15 条
  • [1] MelTrans: Mel-Spectrogram Relationship-Learning for Speech Emotion Recognition via Transformers
    Li, Hui
    Li, Jiawen
    Liu, Hai
    Liu, Tingting
    Chen, Qiang
    You, Xinge
    SENSORS, 2024, 24 (17)
  • [2] Experimental Analysis and Selection of Spectrogram Features for Speech Emotion Recognition
    Tang, Gui-Chen
    Liang, Rui-Yu
    Feng, Yue-Qin
    Wang, Qing-Yun
    INTERNATIONAL CONFERENCE ON MECHANICS, BUILDING MATERIAL AND CIVIL ENGINEERING (MBMCE 2015), 2015, : 757 - 762
  • [3] Learning Mutual Correlation in Multimodal Transformer for Speech Emotion Recognition
    Wang, Yuhua
    Shen, Guang
    Xu, Yuezhu
    Li, Jiahang
    Zhao, Zhengdao
    INTERSPEECH 2021, 2021, : 4518 - 4522
  • [4] Investigation of the effect of spectrogram images and different texture analysis methods on speech emotion recognition
    Ozseven, Turgut
    APPLIED ACOUSTICS, 2018, 142 : 70 - 77
  • [5] Effective MLP and CNN based ensemble learning for speech emotion recognition
    Middya A.I.
    Nag B.
    Roy S.
    Multimedia Tools and Applications, 2024, 83 (36) : 83963 - 83990
  • [6] On the Effect of Log-Mel Spectrogram Parameter Tuning for Deep Learning-Based Speech Emotion Recognition
    Mukhamediya, Azamat
    Fazli, Siamac
    Zollanvari, Amin
    IEEE ACCESS, 2023, 11 : 61950 - 61957
  • [7] Speech Emotion Recognition using Feature Selection with Adaptive Structure Learning
    Rayaluru, Akshay
    Bandela, Surekha Reddy
    Kumar, T. Kishore
    2019 IEEE INTERNATIONAL SYMPOSIUM ON SMART ELECTRONIC SYSTEMS (ISES 2019), 2019, : 233 - 236
  • [8] Focus-attention-enhanced Crossmodal Transformer with Metric Learning for Multimodal Speech Emotion Recognition
    Kim, Keulbit
    Cho, Namhyun
    INTERSPEECH 2023, 2023, : 2673 - 2677
  • [9] Transformer-based transfer learning and multi-task learning for improving the performance of speech emotion recognition
    Park, Sunchan
    Kim, Hyung Soon
    JOURNAL OF THE ACOUSTICAL SOCIETY OF KOREA, 2021, 40 (05): : 515 - 522
  • [10] A Study on a Speech Emotion Recognition System with Effective Acoustic Features Using Deep Learning Algorithms
    Byun, Sung-Woo
    Lee, Seok-Pil
    APPLIED SCIENCES-BASEL, 2021, 11 (04): : 1 - 15