Front-End Feature Compensation for Noise Robust Speech Emotion Recognition

被引：1

作者：

Pandharipande, Meghna ^{[1
]}

Chakraborty, Rupayan ^{[1
]}

Panda, Ashish ^{[1
]}

Das, Biswajit ^{[1
]}

Kopparapu, Sunil Kumar ^{[1
]}

机构：

[1] TCS Res & Innovat Mumbai, Yantra Pk, Thana 400601, Maharashtra, India

来源：

2019 27TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO) | 2019年

关键词：

Emotion recognition; Noisy speech; Feature compensation; Auditory masking; Vector Taylor Series;

D O I：

10.23919/eusipco.2019.8902981

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Robust feature compensation and selection are important aspects of noisy speech emotion recognition (SER) task, especially in mismatched condition, when the models are trained on clean speech and tested in the noisy scenarios. Here we propose the use of front-end feature compensation techniques based on Vector Taylor Series (VTS) expansion and VTS with auditory masking (VTS-AM) to improve the performance of SER systems. On top of VTS and VTS-AM, we compare the performances of log-compression and root-compression to the mel-filter-bank energies. Further, we demonstrate the benefit of feature selection applied to the non-MFCC high-level descriptors in conjunction with VTS, VTS-AM and root compression. The system performance is compared with popular Non-negative Matrix Factorization (NMF) based enhancement and energy based voice activity detector (VAD) technique, which discards silence or noisy frames in the spoken utterances. To demonstrate the efficacy of our proposed techniques, extensive experiments are conducted on 2 standard datasets (EmoDB and IEMOCAP), contaminated with 5 types of noise (Babble, F-16, Factory, Volvo, and HF-channel) from the Noisex-92 noise database at 5 SNR levels (0dB, 5dB, 10dB, 15dB and 20dB).

引用

页数：5

共 22 条

[1]

Alex Acero, 2000, P INT C SPOK LANG PR

[2]

[Anonymous], 1998, CORRELATION BASED FE

[3]

[Anonymous], 2005, INTERSPEECH

[4]

[Anonymous], 2011, PROC 2011 WORKSHOP A

[5] IEMOCAP: interactive emotional dyadic motion capture database [J].

Busso, Carlos ;

Bulut, Murtaza ;

Lee, Chi-Chun ;

Kazemzadeh, Abe ;

Mower, Emily ;

Kim, Samuel ;

Chang, Jeannette N. ;

Lee, Sungbok ;

Narayanan, Shrikanth S. .

LANGUAGE RESOURCES AND EVALUATION, 2008, 42 (04) :335-359

[6] Speech emotion recognition: Features and classification models [J].

Chen, Lijiang ;

Mao, Xia ;

Xue, Yuli ;

Cheng, Lee Lung .

DIGITAL SIGNAL PROCESSING, 2012, 22 (06) :1154-1160

[7]

Das B., 2017, INT C AC SPEECH SIGN

[8]

Huang C., 2013, ARCH ACOUSTICS

[9]

Juszkiewicz L., 2014, INTELLIGENT DISTRIBU, VVII

[10] Step-Based Data Sharing and Exchange in One-of-a-Kind Product Collaborative Design for Cloud Manufacturing [J].

Li, B. M. ;

Xie, S. Q. ;

Sang, Z. Q. .

ADVANCES IN MECHANICAL ENGINEERING, 2013,

← 1 2 3 →