Speaker-independent speech emotion recognition by fusion of functional and accompanying paralanguage features

被引:5
作者
Mao, Qi-rong [1 ]
Zhao, Xiao-lei [1 ]
Huang, Zheng-wei [1 ]
Zhan, Yong-zhao [1 ]
机构
[1] Jiangsu Univ, Dept Comp Sci & Commun Engn, Zhenjiang 212013, Peoples R China
来源
JOURNAL OF ZHEJIANG UNIVERSITY-SCIENCE C-COMPUTERS & ELECTRONICS | 2013年 / 14卷 / 07期
基金
中国国家自然科学基金;
关键词
Speech emotion recognition; Speaker-independent; Functional paralanguage; Fusion algorithm; Recognition accuracy; DISCRIMINATION; LAUGHTER;
D O I
10.1631/jzus.CIDE1310
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Functional paralanguage includes considerable emotion information, and it is insensitive to speaker changes. To improve the emotion recognition accuracy under the condition of speaker-independence, a fusion method combining the functional paralanguage features with the accompanying paralanguage features is proposed for the speaker-independent speech emotion recognition. Using this method, the functional paralanguages, such as laughter, cry, and sigh, are used to assist speech emotion recognition. The contributions of our work are threefold. First, one emotional speech database including six kinds of functional paralanguage and six typical emotions were recorded by our research group. Second, the functional paralanguage is put forward to recognize the speech emotions combined with the accompanying paralanguage features. Third, a fusion algorithm based on confidences and probabilities is proposed to combine the functional paralanguage features with the accompanying paralanguage features for speech emotion recognition. We evaluate the usefulness of the functional paralanguage features and the fusion algorithm in terms of precision, recall, and F1-measurement on the emotional speech database recorded by our research group. The overall recognition accuracy achieved for six emotions is over 67% in the speaker-independent condition using the functional paralanguage features.
引用
收藏
页码:573 / 582
页数:10
相关论文
共 28 条
[1]  
[Anonymous], 2004, INT C SPEECH PROS 20
[2]   The acoustic features of human laughter [J].
Bachorowski, JA ;
Smoski, MJ ;
Owren, MJ .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2001, 110 (03) :1581-1597
[3]  
BERLER A, 1997, P 13 C UNC ART INT, P14
[4]  
Devillers L, 2006, INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, P801
[5]   Survey on speech emotion recognition: Features, classification schemes, and databases [J].
El Ayadi, Moataz ;
Kamel, Mohamed S. ;
Karray, Fakhri .
PATTERN RECOGNITION, 2011, 44 (03) :572-587
[6]   Recognition of para-linguistic information and its application to spoken dialogue system [J].
Fujie, S ;
Ejiri, Y ;
Matsusaka, Y ;
Kikuchi, H ;
Kobayashi, T .
ASRU'03: 2003 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING ASRU '03, 2003, :231-236
[7]  
Hayashi Y., 1999, P 14 INT C PHON SCI, P2355
[8]   A GA-based feature selection and parameters optimization for support vector machines [J].
Huang, Cheng-Lung ;
Wang, Chieh-Jen .
EXPERT SYSTEMS WITH APPLICATIONS, 2006, 31 (02) :231-240
[9]  
Huang Cheng-wei, 2010, Technical Acoustics, V29, P396, DOI 10.3969/j.issn1000-3630.2010.04.010
[10]   Acoustic breath-phase detection using tracheal breath sounds [J].
Huq, Saiful ;
Moussavi, Zahra .
MEDICAL & BIOLOGICAL ENGINEERING & COMPUTING, 2012, 50 (03) :297-308