Reinforcement Learning based Data Augmentation for Noise Robust Speech Emotion Recognition

被引：0

作者：

Ranjan, Sumit ^{[1
]}

Chakraborty, Rupayan ^{[1
]}

Kopparapu, Sunil Kumar ^{[1
]}

机构：

[1] Tata Consultancy Serv Ltd, TCS Res, Bengaluru, India

来源：

INTERSPEECH 2024 | 2024年

关键词：

speech emotion recognition; noise robustness; selective data augmentation; reinforcement learning;

D O I：

10.21437/Interspeech.2024-921

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Speech emotion recognition (SER) is an indispensable component of any human machine interactions, and enables building empathetic voice user interfaces. Ability to accurately recognize emotion in noisy environments is important in practical scenarios when a person is interacting with a machine or an agent as in the case of a voice based call center. In this paper, we propose reinforcement learning (RL) based data augmentation technique to enable building a robust SER system. The reward function used in RL enables picking selective noises spread over different frequency bands for data augmentation. We show that the proposed RL based augmentation technique is superior to a recently proposed random selection based technique for the noise robust SER task. We use IEMOCAP dataset with four emotion classes for validating the proposed technique. Moreover, we test the noise robustness of SER system in both cross-corpus and cross-language scenarios.

引用

页码：1040 / 1044

页数：5

共 23 条

[1] Burkhardt F., 2005, INTERSPEECH, P1517, DOI DOI 10.21437/INTERSPEECH.2005-446
[2] IEMOCAP: interactive emotional dyadic motion capture database
Busso, Carlos
Bulut, Murtaza
Lee, Chi-Chun
Kazemzadeh, Abe
Mower, Emily
Kim, Samuel
Chang, Jeannette N.
Lee, Sungbok
Narayanan, Shrikanth S.
[J]. LANGUAGE RESOURCES AND EVALUATION, 2008, 42 (04) : 335 - 359
[3] Carr T, 2018, Arxiv, DOI arXiv:1812.07452
[4] Chakraborty R., 2017, Analyzing emotion in spontaneous speech, DOI [10.1007/978-981-10-7674-9, DOI 10.1007/978-981-10-7674-9]
[5] Front-end Feature Compensation and Denoising for Noise Robust Speech Emotion Recognition
Chakraborty, Rupayan
Panda, Ashish
Pandharipande, Meghna
Joshi, Sonal
Kopparapu, Sunil Kumar
[J]. INTERSPEECH 2019, 2019, : 3257 - 3261
[6] An ongoing review of speech emotion recognition
de Lope, Javier
Grana, Manuel
[J]. NEUROCOMPUTING, 2023, 528 : 1 - 11
[7] Eyben F., 2010, P 18 ACM INT C MULT, P1459, DOI DOI 10.1145/1873951.1874246
[8] Heracleous P, 2017, INT CONF AFFECT, P262, DOI 10.1109/ACII.2017.8273610
[9] Lakomkin E, 2018, IEEE INT CONF ROBOT, P4445
[10] A survey on deep reinforcement learning for audio-based applications
Latif, Siddique
Cuayahuitl, Heriberto
Pervez, Farrukh
Shamshad, Fahad
Ali, Hafiz Shehbaz
Cambria, Erik
[J]. ARTIFICIAL INTELLIGENCE REVIEW, 2023, 56 (03) : 2193 - 2240

← 1 2 3 →