Reinforcement Learning based Data Augmentation for Noise Robust Speech Emotion Recognition

被引:0
作者
Ranjan, Sumit [1 ]
Chakraborty, Rupayan [1 ]
Kopparapu, Sunil Kumar [1 ]
机构
[1] Tata Consultancy Serv Ltd, TCS Res, Bengaluru, India
来源
INTERSPEECH 2024 | 2024年
关键词
speech emotion recognition; noise robustness; selective data augmentation; reinforcement learning;
D O I
10.21437/Interspeech.2024-921
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Speech emotion recognition (SER) is an indispensable component of any human machine interactions, and enables building empathetic voice user interfaces. Ability to accurately recognize emotion in noisy environments is important in practical scenarios when a person is interacting with a machine or an agent as in the case of a voice based call center. In this paper, we propose reinforcement learning (RL) based data augmentation technique to enable building a robust SER system. The reward function used in RL enables picking selective noises spread over different frequency bands for data augmentation. We show that the proposed RL based augmentation technique is superior to a recently proposed random selection based technique for the noise robust SER task. We use IEMOCAP dataset with four emotion classes for validating the proposed technique. Moreover, we test the noise robustness of SER system in both cross-corpus and cross-language scenarios.
引用
收藏
页码:1040 / 1044
页数:5
相关论文
共 23 条
  • [1] Burkhardt F., 2005, INTERSPEECH, P1517, DOI DOI 10.21437/INTERSPEECH.2005-446
  • [2] IEMOCAP: interactive emotional dyadic motion capture database
    Busso, Carlos
    Bulut, Murtaza
    Lee, Chi-Chun
    Kazemzadeh, Abe
    Mower, Emily
    Kim, Samuel
    Chang, Jeannette N.
    Lee, Sungbok
    Narayanan, Shrikanth S.
    [J]. LANGUAGE RESOURCES AND EVALUATION, 2008, 42 (04) : 335 - 359
  • [3] Carr T, 2018, Arxiv, DOI arXiv:1812.07452
  • [4] Chakraborty R., 2017, Analyzing emotion in spontaneous speech, DOI [10.1007/978-981-10-7674-9, DOI 10.1007/978-981-10-7674-9]
  • [5] Front-end Feature Compensation and Denoising for Noise Robust Speech Emotion Recognition
    Chakraborty, Rupayan
    Panda, Ashish
    Pandharipande, Meghna
    Joshi, Sonal
    Kopparapu, Sunil Kumar
    [J]. INTERSPEECH 2019, 2019, : 3257 - 3261
  • [6] An ongoing review of speech emotion recognition
    de Lope, Javier
    Grana, Manuel
    [J]. NEUROCOMPUTING, 2023, 528 : 1 - 11
  • [7] Eyben F., 2010, P 18 ACM INT C MULT, P1459, DOI DOI 10.1145/1873951.1874246
  • [8] Heracleous P, 2017, INT CONF AFFECT, P262, DOI 10.1109/ACII.2017.8273610
  • [9] Lakomkin E, 2018, IEEE INT CONF ROBOT, P4445
  • [10] A survey on deep reinforcement learning for audio-based applications
    Latif, Siddique
    Cuayahuitl, Heriberto
    Pervez, Farrukh
    Shamshad, Fahad
    Ali, Hafiz Shehbaz
    Cambria, Erik
    [J]. ARTIFICIAL INTELLIGENCE REVIEW, 2023, 56 (03) : 2193 - 2240