Semi-Natural and Spontaneous Speech Recognition Using Deep Neural Networks with Hybrid Features Unification

被引:9
作者
Amjad, Ammar [1 ]
Khan, Lal [1 ]
Chang, Hsien-Tsung [1 ,2 ,3 ,4 ]
机构
[1] Chang Gung Univ, Dept Comp Sci & Informat Engn, Taoyuan 33302, Taiwan
[2] Chang Gung Mem Hosp, Dept Phys Med & Rehabil, Taoyuan 33302, Taiwan
[3] Chang Gung Univ, Artificial Intelligence Res Ctr, Taoyuan 33302, Taiwan
[4] Chang Gung Univ, Bachelor Program Artificial Intelligence, Taoyuan 33302, Taiwan
关键词
spontaneous database; semi-natural database; speech emotion recognition; multiple feature fusion; support vector machine; EMOTION RECOGNITION; INFORMATION; SPACE;
D O I
10.3390/pr9122286
中图分类号
TQ [化学工业];
学科分类号
0817 ;
摘要
Recently, identifying speech emotions in a spontaneous database has been a complex and demanding study area. This research presents an entirely new approach for recognizing semi-natural and spontaneous speech emotions with multiple feature fusion and deep neural networks (DNN). A proposed framework extracts the most discriminative features from hybrid acoustic feature sets. However, these feature sets may contain duplicate and irrelevant information, leading to inadequate emotional identification. Therefore, an support vector machine (SVM) algorithm is utilized to identify the most discriminative audio feature map after obtaining the relevant features learned by the fusion approach. We investigated our approach utilizing the eNTERFACE05 and BAUM-1s benchmark databases and observed a significant identification accuracy of 76% for a speaker-independent experiment with SVM and 59% accuracy with, respectively. Furthermore, experiments on the eNTERFACE05 and BAUM-1s dataset indicate that the suggested framework outperformed current state-of-the-art techniques on the semi-natural and spontaneous datasets.
引用
收藏
页数:16
相关论文
共 54 条
  • [11] Eyben F., 2010, P ACM INT C MULT FLO, P1459, DOI [10.1145/1873951.1874246, DOI 10.1145/1873951.1874246]
  • [12] The Geneva Minimalistic Acoustic Parameter Set (GeMAPS) for Voice Research and Affective Computing
    Eyben, Florian
    Scherer, Klaus R.
    Schuller, Bjoern W.
    Sundberg, Johan
    Andre, Elisabeth
    Busso, Carlos
    Devillers, Laurence Y.
    Epps, Julien
    Laukka, Petri
    Narayanan, Shrikanth S.
    Truong, Khiet P.
    [J]. IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2016, 7 (02) : 190 - 202
  • [13] Glodek M, 2011, LECT NOTES COMPUT SC, V6975, P359, DOI 10.1007/978-3-642-24571-8_47
  • [14] Gönen M, 2011, J MACH LEARN RES, V12, P2211
  • [15] Gu Y, 2018, PROCEEDINGS OF THE 56TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL), VOL 1, P2225
  • [16] Gu Y, 2018, 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), P5079, DOI 10.1109/ICASSP.2018.8462440
  • [17] Gupta R., 2014, Proceedings of the 4th International Workshop on Audio/Visual Emotion Challenge, P33, DOI DOI 10.1145/2661806.2661810
  • [18] Hershey S, 2017, INT CONF ACOUST SPEE, P131, DOI 10.1109/ICASSP.2017.7952132
  • [19] A fast learning algorithm for deep belief nets
    Hinton, Geoffrey E.
    Osindero, Simon
    Teh, Yee-Whye
    [J]. NEURAL COMPUTATION, 2006, 18 (07) : 1527 - 1554
  • [20] Audio-Visual Emotion-Aware Cloud Gaming Framework
    Hossain, M. Shamim
    Muhammad, Ghulam
    Song, Biao
    Hassan, Mohammad Mehedi
    Alelaiwi, Abdulhameed
    Alamri, Atif
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2015, 25 (12) : 2105 - 2118