REVISITING HIDDEN MARKOV MODELS FOR SPEECH EMOTION RECOGNITION

被引:0
作者
Mao, Shuiyang [1 ]
Tao, Dehua [1 ]
Zhang, Guangyan [1 ]
Ching, P. C. [1 ]
Lee, Tan [1 ]
机构
[1] Chinese Univ Hong Kong, Dept Elect Engn, Hong Kong, Peoples R China
来源
2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2019年
关键词
Speech emotion recognition; hidden Markov models; subspace based GMM; hybrid DNN-HMM;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Hidden Markov models (HMMs) have a long tradition in automatic speech recognition (ASR) due to their capability of capturing temporal dynamic characteristics of speech. For emotion recognition from speech, three HMM based architectures are investigated and compared throughout the current paper, namely, the Gaussian mixture model based HMMs (GMM-HMMs), the subspace based Gaussian mixture model based HMMs (SGMM-HMMs) and the hybrid deep neural network HMMs (DNN-HMMs). Extensive emotion recognition experiments are carried out on these three architectures on the CASIA corpus, the Emo-DB corpus and the IEMOCAP database, respectively, and results are compared with those of state-of-the-art approaches. These HMM based architectures prove capable of constituting an effective model for speech emotion recogntion. Also, the modeling accuracy is further enhanced by incorporating various advanced techniques from the ASR area. In particular, among all of the architectures, the SGMM-HMMs achieve the best performance in most of the experiments.
引用
收藏
页码:6715 / 6719
页数:5
相关论文
共 27 条
[1]  
Amir Noam., 1998, 5th ICSLP, V98, P699
[2]  
[Anonymous], 2015, P INTERSPEECH
[3]  
[Anonymous], 2014, Advances in neural information processing systems
[4]  
[Anonymous], IEEE
[5]  
Han K., 2014, P INTERSPEECH
[6]   Towards Temporal Modelling of Categorical Speech Emotion Recognition [J].
Han, Wenjing ;
Ruan, Huabin ;
Chen, Xiaomin ;
Wang, Zhixiang ;
Li, Haifeng ;
Schuller, Bjoern .
19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, :932-936
[7]  
Heuft B, 1996, ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, P1974, DOI 10.1109/ICSLP.1996.608023
[8]   Deep Neural Networks for Acoustic Modeling in Speech Recognition [J].
Hinton, Geoffrey ;
Deng, Li ;
Yu, Dong ;
Dahl, George E. ;
Mohamed, Abdel-rahman ;
Jaitly, Navdeep ;
Senior, Andrew ;
Vanhoucke, Vincent ;
Patrick Nguyen ;
Sainath, Tara N. ;
Kingsbury, Brian .
IEEE SIGNAL PROCESSING MAGAZINE, 2012, 29 (06) :82-97
[9]   Attention Assisted Discovery of Sub-Utterance Structure in Speech Emotion Recognition [J].
Huang, Che-Wei ;
Narayanan, Shrikanth S. .
17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, :1387-1391
[10]  
Iida A., 1998, P ICSLP, P1559