Enhancing Speech Emotion Recognition for Real-World Applications via ASR Integration

被引:0
|
作者
Li, Yuanchao [1 ]
机构
[1] Univ Edinburgh, Ctr Speech Technol Res, Edinburgh, Midlothian, Scotland
关键词
speech emotion recognition; self-supervised learning; semi-supervised learning; multimodal fusion; ASR; FEATURES;
D O I
10.1109/ACIIW59127.2023.10388136
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Speech Emotion Recognition (SER) is a major focus of affective computing and has developed rapidly over the past few decades. However, the lack of large-scale emotional speech data has hindered the application of SER. As a result, most SER research conducted on existing corpora in the lab is not applicable to the wild. To address this issue, we propose integrating Automatic Speech Recognition (ASR), which is widely used in the wild, into SER for real-world applications. Specifically, we aim to investigate the mutual impact of speech and emotion recognition to understand how ASR performs on emotional speech and leverage the representations and transcripts from ASR models to develop a robust SER model. This research is expected to alleviate the data scarcity problem in SER and enable its use in various real-world applications. Furthermore, our findings can expand the usage of ASR in complicated speech scenarios (e.g., emotional speech) and advance other speech tasks (e.g., recognizing affective and health states) that face similar issues as SER.
引用
收藏
页数:5
相关论文
共 50 条
  • [1] Speech Emotion Recognition Applied to Real-World Medical Consultation
    Huang, Ching-Tzu
    Huang, Chih-Wei
    Yang, Hsuan-Chia
    Li, Yu-Chuan
    MEDINFO 2023 - THE FUTURE IS ACCESSIBLE, 2024, 310 : 1121 - 1125
  • [2] Domain Adapting Deep Reinforcement Learning for Real-World Speech Emotion Recognition
    Rajapakshe, Thejan
    Rana, Rajib
    Khalifa, Sara
    Schuller, Bjoern W.
    IEEE ACCESS, 2024, 12 : 193101 - 193114
  • [3] Study on Speaker-Independent Emotion Recognition from Speech on Real-World Data
    Kostoulas, Theodoros
    Ganchev, Todor
    Fakotakis, Nikos
    VERBAL AND NONVERBAL FEATURES OF HUMAN-HUMAN AND HUMAN-MACHINE INTERACTIONS, 2008, 5042 : 235 - 242
  • [4] Are you sure? Analysing Uncertainty Quantification Approaches for Real-world Speech Emotion Recognition
    Schruefer, Oliver
    Milling, Manuel
    Burkhardt, Felix
    Eyben, Florian
    Schuller, Bjoern
    INTERSPEECH 2024, 2024, : 3210 - 3214
  • [5] The Impact of Face Mask and Emotion on Automatic Speech Recognition (ASR) and Speech Emotion Recognition (SER)
    Oh, Qi Qi
    Seow, Chee Kiat
    Yusuff, Mulliana
    Pranata, Sugiri
    Cao, Qi
    2023 8TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND BIG DATA ANALYTICS, ICCCBDA, 2023, : 523 - 531
  • [6] Application of Transfer Learning-Based English Speech Emotion Recognition in Real-World Scenarios
    Zhang, Ping
    PROCEEDINGS OF 2024 INTERNATIONAL CONFERENCE ON MACHINE INTELLIGENCE AND DIGITAL APPLICATIONS, MIDA2024, 2024, : 224 - 229
  • [7] EEGNetT: EEG-based neural network for emotion recognition in real-world applications
    Zhu, Yuxuan
    Ozawa, Kenji
    Kong, Wanzeng
    2021 IEEE 3RD GLOBAL CONFERENCE ON LIFE SCIENCES AND TECHNOLOGIES (IEEE LIFETECH 2021), 2021, : 376 - 378
  • [8] EMOTION RECOGNITION FROM SPEECH: PUTTING ASR IN THE LOOP
    Schuller, Bjoern
    Batliner, Anton
    Steidl, Stefan
    Seppi, Dino
    2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 4585 - +
  • [9] Improving Speech Emotion Recognition via Fine-tuning ASR with Speaker Information
    Ta, Bao Thang
    Nguyen, Tung Lam
    Dang, Dinh Son
    Le, Nhat Minh
    Do, Van Hai
    PROCEEDINGS OF 2022 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2022, : 1596 - 1601
  • [10] Segmentation and its real-world applications in speech processing
    Sattar, Farook
    Nilsson, Mikael
    Claesson, Ingvar
    2007 9TH INTERNATIONAL SYMPOSIUM ON SIGNAL PROCESSING AND ITS APPLICATIONS, VOLS 1-3, 2007, : 788 - +