Talking condition recognition in stressful and emotional talking environments based on CSPHMM2s

被引：19

作者：

Shahin I. ^{[1
]}

Ba-Hutair M.N. ^{[1
]}

机构：

[1] Department of Electrical and Computer Engineering, University of Sharjah, P. O. Box 27272, Sharjah

来源：

International Journal of Speech Technology | 2014年 / 18卷 / 01期

关键词：

Emotional talking environments; Hidden Markov models; Second-order circular hidden Markov models; Second-order circular suprasegmental hidden Markov models; Stressful talking environments; Suprasegmental hidden Markov models;

D O I：

10.1007/s10772-014-9251-7

中图分类号：

学科分类号：

摘要：

This work is aimed at exploiting second-order circular suprasegmental hidden Markov models (CSPHMM2s) as classifiers to enhance talking condition recognition in stressful and emotional talking environments (completely two separate environments). The stressful talking environment that has been used in this work uses speech under simulated and actual stress database, while the emotional talking environment uses emotional prosody speech and transcripts database. The achieved results of this work using mel-frequency cepstral coefficients demonstrate that CSPHMM2s outperform each of hidden Markov models, second-order circular hidden Markov models, and suprasegmental hidden Markov models in enhancing talking condition recognition in the stressful and emotional talking environments. The results also show that the performance of talking condition recognition in stressful talking environments leads that in emotional talking environments by 3.67 % based on CSPHMM2s. Our results obtained in subjective evaluation by human judges fall within 2.14 and 3.08 % of those obtained, respectively, in stressful and emotional talking environments based on CSPHMM2s. © 2014, Springer Science+Business Media New York.

引用

页码：77 / 90

页数：13

共 32 条

[1]

Bou-Ghazale S.E., Hansen J.H.L., A comparative study of traditional and newly proposed features for recognition of speech under stress, IEEE Transactions on Speech and Audio Processing, 8, 4, pp. 429-442, (2000)

[2]

Campbell W.M., Campbell J.R., Reynolds D.A., Singer E., Torres-Carrasquillo P.A., Support vector machines for speaker and language recognition, Computer Speech and Language, 20, pp. 210-229, (2006)

[3]

Casale S., Russo A., Serrano S., Multistyle classification of speech under stress using feature subset selection based on genetic algorithms, Speech Communication, 49, 10-11, pp. 801-810, (2007)

[4]

Casale S., Russo A., Serano S., Multistyle classification of speech under stress using feature subset selection based on genetic algorithms, Speech Communication, 49, 10, pp. 801-810, (2007)

[5]

Chen Y., Cepstral domain talker stress compensation for robust speech recognition, IEEE Transactions on ASSP, 36, 4, pp. 433-439, (1988)

[6]

Cowie R., Douglas-Cowie E., Tsapatsoulis N., Votsis G., Collias S., Fellenz W., Taylor J., Emotion recognition in human-computer interaction, IEEE Signal Processing Magazine, 18, 1, pp. 32-80, (2001)

[7]

Davis S., Mermelstein P., Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences, IEEE Transactions on Acoustics, Speech and Signal Processing, 28, 4, pp. 357-366, (1980)

[8]

Falk T.H., Chan W.Y., Modulation spectral features for robust far-field speaker identification, IEEE Transactions on Audio, Speech and Language Processing, 18, 1, pp. 90-100, (2010)

[9]

Fragopanagos N., Taylor J.G., Emotion recognition in human-computer interaction, Neural Networks, 18, pp. 389-405, (2005)

[10]

Hansen J.H.L., Bou-Ghazale S., Getting started with SUSAS: A speech under simulated and actual stress database. EUROSPEECH-97: International conference on speech communication and technology, Rhodes, Greece, September, 1997, pp. 1743-1746, (1997)

← 1 2 3 4 →