Robust emotion recognition by spectro-temporal modulation statistic features

被引:15
作者
Chi, Tai-Shih [1 ]
Yeh, Lan-Ying [1 ]
Hsu, Chin-Cheng [1 ]
机构
[1] Natl Chiao Tung Univ, Dept Elect Engn, Hsinchu 300, Taiwan
关键词
Robust emotion recognition; Spectro-temporal modulation; SPEECH RECOGNITION; FREQUENCY;
D O I
10.1007/s12652-011-0088-5
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Most speech emotion recognition studies consider clean speech. In this study, statistics of joint spectro-temporal modulation features are extracted from an auditory perceptual model and are used to detect the emotion status of speech under noisy conditions. Speech samples were extracted from the Berlin Emotional Speech database and corrupted with white and babble noise under various SNR levels. This study investigates a clean train/noisy test scenario to simulate practical conditions with unknown noisy sources. Simulations demonstrate the redundancy of the proposed spectro-temporal modulation features and further consider the dimensionality reduction. The proposed modulation features achieve higher recognition rates of speech emotions under noisy conditions than (1) conventional mel-frequency cepstral coefficients combined with prosodic features; (2) official acoustic features adopted in the INTERSPEECH 2009 Emotion Challenge. Adding modulation features increased the recognition rates of INTERSPEECH proposed features by approximately 7% for all tested SNR conditions (20-0 dB).
引用
收藏
页码:47 / 60
页数:14
相关论文
共 40 条
[1]  
[Anonymous], LINGUISTIC INSIGHTS
[2]  
[Anonymous], 2005, P INT 2005 LISB PORT
[3]  
[Anonymous], 2007, Speech Enhancement: Theory and Practice
[4]  
[Anonymous], P SPEECH PROS
[5]  
Batista G. E., 2004, ACM SIGKDD Explor. Newslett., P20, DOI [10.1145/1007730.1007735, DOI 10.1145/1007730.1007735]
[6]  
Bregman A. S., 1990, Auditory Scene Analysis: The Perceptual Organization of Sound, DOI [DOI 10.7551/MITPRESS/1486.001.0001, DOI 10.1121/1.408434]
[7]  
Burkhardt Felix, 2005, P INT, P489
[8]   Analysis of Emotionally Salient Aspects of Fundamental Frequency for Emotion Detection [J].
Busso, Carlos ;
Lee, Sungbok ;
Narayanan, Shrikanth .
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2009, 17 (04) :582-596
[9]   The effect of modulation rate on the detection of frequency modulation and mistuning of complex tones [J].
Carlyon, RP ;
Moore, BCJ ;
Micheyl, C .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2000, 108 (01) :304-315
[10]   LIBSVM: A Library for Support Vector Machines [J].
Chang, Chih-Chung ;
Lin, Chih-Jen .
ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2011, 2 (03)