Prominence features: Effective emotional features for speech emotion recognition

被引:44
作者
Jing, Shaoling [1 ]
Mao, Xia [1 ]
Chen, Lijiang [1 ]
机构
[1] Beihang Univ, Sch Elect & Informat Engn, Beijing 100191, Peoples R China
基金
中国国家自然科学基金;
关键词
Prominence features; Speech annotation; Consistency assessment; Speech emotion recognition; FUNDAMENTAL-FREQUENCY; PERCEIVED PROMINENCE; AGREEMENT;
D O I
10.1016/j.dsp.2017.10.016
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Emotion-related feature extraction is a challenging task in speech emotion recognition. Due to the lack of discriminative acoustic features, classical approaches based on traditional acoustic features could not provide satisfactory performances. This research proposes a novel type of feature related to prominence, which, together with traditional acoustic features, are used to classify seven typical different emotional states. To this end, the author group produces a Chinese Dual-mode Emotional Speech Database (CDESD), which contains additional prominence and paralinguistic annotation information. Then, a consistency assessment algorithm is presented to validate the reliability of the annotation information of this database. The results show that the annotation consistency on prominence reaches more than 60% on average. Subsequently, this research analyzes the correlation of the prominence features with emotional states using a curve fitting method. Prominence is found to be closely related to emotion states, to retain emotional information at the word level to the greatest possible extent and to play an important role in emotional expression. Finally, the proposed prominence features are validated on CDESD through speaker dependent and speaker-independent experiments with four commonly used classifiers. The results show that the average recognition rate achieved using the combined features is improved by 6% in speaker dependent experiments and by 6.2% in speaker-independent experiments compared with that achieved using only acoustic features. (C) 2017 Elsevier Inc. All rights reserved.
引用
收藏
页码:216 / 231
页数:16
相关论文
共 65 条
[1]  
[Anonymous], EUR C SPEECH COMM TE
[2]  
[Anonymous], 1997, P 5 EUR C SPEECH COM
[3]  
[Anonymous], 1999, P 6 EUROPEAN C SPEEC
[4]  
[Anonymous], 2014, P SPEECH PROSODY
[5]  
Beckman Mary E., 2000, SCI TECHN AG PRIOR P, P87
[6]  
Boril H, 2010, 11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, P502
[7]  
Burkhardt F., 2005, Interspeech, P1517, DOI DOI 10.21437/INTERSPEECH.2005-446
[8]  
Chao Y., 1965, J ASIAN STUD, V26, P103
[9]  
Chen K, 2007, LECT NOTES COMPUT SC, V4489, P555
[10]   Text-Independent Phoneme Segmentation Combining EGG and Speech Data [J].
Chen, Lijiang ;
Mao, Xia ;
Yan, Hong .
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2016, 24 (06) :1029-1037