Neural network and GMM based feature mappings for consonant–vowel recognition in emotional environment

被引：2

作者：

Yadav J. ^{[1
]}

Rao K.S. ^{[1
]}

机构：

[1] Computer Science and Engineering, Indian Institute of Technology, Kharagpur

来源：

International Journal of Speech Technology | 2018年 / 21卷 / 03期

关键词：

Consonant–vowel recognition; Feature mapping; Gaussian mixture model; Hidden Markov model; Neural network; Vowel onset and Offset point;

D O I：

10.1007/s10772-017-9478-1

中图分类号：

学科分类号：

摘要：

In this work, we propose a mapping function based feature transformation framework for developing consonant–vowel (CV) recognition system in the emotional environment. An effective way of conveying messages is by expressing emotions during human conversations. The characteristics of CV units differ from one emotion to other emotions. The performance of existing CV recognition systems is degraded in emotional environments. Therefore, we have proposed mapping functions based on artificial neural network and GMM models for increasing the accuracy of CV recognition in the emotional environment. The CV recognition system has been explored to transform emotional features to neutral features using proposed mapping functions at CV and phone levels to minimize mismatch between training and testing environments. Vowel onset and offset points have been used to identify vowel, consonant and transition segments. Transition segments are identified by considering initial 15% speech samples between vowel onset and offset points. The average performance of CV recognition system is increased significantly using feature mapping technique at phone level in three emotional environments (anger, happiness, and sadness). © 2017, Springer Science+Business Media, LLC, part of Springer Nature.

引用

页码：421 / 433

页数：12

共 51 条

[1]

Abe M., Nakamura S., Shikano K., Kuwabara H., Voice conversion through vector quantization, Proceedings of IEEE International Conference on Acoustics, Speech Signal Processing, 1, pp. 655-658, (1988)

[2]

Buera L., Lleida E., Miguel A., Ortega A., Multi-environment models based linear normalization for speech recognition in car conditions, Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, (2004)

[3]

Buera L., Lleida E., Miguel A., Ortega A., Saz S., Cepstral vector normalization based on stereo data for robust speech recognition, IEEE Transactions on Audio, Speech, and Language Processing, 15, 3, pp. 1098-1113, (2007)

[4]

Buera L., Miguel A., Saz S., Ortega A., Lleida E., Unsupervised data-driven feature vector normalization with acoustic model adaptation for robust speech recognition, IEEE Transactions on Audio, Speech, and Language Processing, 18, 2, pp. 296-309, (2010)

[5]

Burges C.J., A tutorial on support vector machines for pattern recognition, Data Mining and Knowledge Discovery, 2, 2, pp. 121-167, (1998)

[6]

Chauhan R., Yadav J., Koolagudi S., Text independent emotion recognition using spectral features, In International Conference on Contemporary Computing, 168, pp. 359-370, (2011)

[7]

Deng L., Acero A., Jiang L., Droppo J., Huang X., High-performance robust speech recognition using stereo training data, Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, 1, pp. 301-304, (2001)

[8]

Desai S., Black A.W., Yegnanarayana B., Prahallad K., Spectral mapping using artificial neural networks for voice conversion, IEEE Transactions on Audio, Speech, and Language Processing, 18, 5, pp. 954-964, (2010)

[9]

Gangashetty S.V., Neural Network Models for Recognition of consonant–vowel Units of Speech In: Multiple Languages, (2004)

[10]

Gangashetty S.V., Sekhar C.C., Yegnanarayana B., Combining evidence from multiple classifiers for recognition of consonant–vowel units of speech in multiple languages, Proceedings of IEEE International Conference on Intelligent Sensing and Information Processing, pp. 387-391, (2005)

← 1 2 3 4 5 6 →