Semi-Natural and Spontaneous Speech Recognition Using Deep Neural Networks with Hybrid Features Unification

被引:9
作者
Amjad, Ammar [1 ]
Khan, Lal [1 ]
Chang, Hsien-Tsung [1 ,2 ,3 ,4 ]
机构
[1] Chang Gung Univ, Dept Comp Sci & Informat Engn, Taoyuan 33302, Taiwan
[2] Chang Gung Mem Hosp, Dept Phys Med & Rehabil, Taoyuan 33302, Taiwan
[3] Chang Gung Univ, Artificial Intelligence Res Ctr, Taoyuan 33302, Taiwan
[4] Chang Gung Univ, Bachelor Program Artificial Intelligence, Taoyuan 33302, Taiwan
关键词
spontaneous database; semi-natural database; speech emotion recognition; multiple feature fusion; support vector machine; EMOTION RECOGNITION; INFORMATION; SPACE;
D O I
10.3390/pr9122286
中图分类号
TQ [化学工业];
学科分类号
0817 ;
摘要
Recently, identifying speech emotions in a spontaneous database has been a complex and demanding study area. This research presents an entirely new approach for recognizing semi-natural and spontaneous speech emotions with multiple feature fusion and deep neural networks (DNN). A proposed framework extracts the most discriminative features from hybrid acoustic feature sets. However, these feature sets may contain duplicate and irrelevant information, leading to inadequate emotional identification. Therefore, an support vector machine (SVM) algorithm is utilized to identify the most discriminative audio feature map after obtaining the relevant features learned by the fusion approach. We investigated our approach utilizing the eNTERFACE05 and BAUM-1s benchmark databases and observed a significant identification accuracy of 76% for a speaker-independent experiment with SVM and 59% accuracy with, respectively. Furthermore, experiments on the eNTERFACE05 and BAUM-1s dataset indicate that the suggested framework outperformed current state-of-the-art techniques on the semi-natural and spontaneous datasets.
引用
收藏
页数:16
相关论文
共 54 条
  • [1] Amiriparian S, 2017, INT CONF AFFECT, P30, DOI 10.1109/ACIIW.2017.8272619
  • [2] Effect on speech emotion classification of a feature selection approach using a convolutional neural network
    Amjad, Ammar
    Khan, Lal
    Chang, Hsien-Tsung
    [J]. PEERJ COMPUTER SCIENCE, 2021, 7
  • [3] [Anonymous], ARXIV180311508
  • [4] Aytar Y, 2016, ADV NEUR IN, V29
  • [5] Baumgartner C., 2012, 2012 IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems (MFI 2012), P498, DOI 10.1109/MFI.2012.6343045
  • [6] IEMOCAP: interactive emotional dyadic motion capture database
    Busso, Carlos
    Bulut, Murtaza
    Lee, Chi-Chun
    Kazemzadeh, Abe
    Mower, Emily
    Kim, Samuel
    Chang, Jeannette N.
    Lee, Sungbok
    Narayanan, Shrikanth S.
    [J]. LANGUAGE RESOURCES AND EVALUATION, 2008, 42 (04) : 335 - 359
  • [7] Convolutional Neural Networks for Multimedia Sentiment Analysis
    Cai, Guoyong
    Xia, Binbin
    [J]. NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING, NLPCC 2015, 2015, 9362 : 159 - 167
  • [8] Exploring the Use of a Voice-based Conversational Agent to Empower Adolescents with Autism Spectrum Disorder
    Cha, Inha
    Kim, Sung-In
    Hong, Hwajung
    Yoo, Heejeong
    Lim, Youn-kyung
    [J]. CHI '21: PROCEEDINGS OF THE 2021 CHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYSTEMS, 2021,
  • [9] Autonomic specificity of discrete emotion and dimensions of affective space: a multivariate approach
    Christie, IC
    Friedman, BH
    [J]. INTERNATIONAL JOURNAL OF PSYCHOPHYSIOLOGY, 2004, 51 (02) : 143 - 153
  • [10] Recognizing Emotions From Whispered Speech Based on Acoustic Feature Transfer Learning
    Deng, Jun
    Fruhholz, Sascha
    Zhang, Zixing
    Schuller, Bjoern
    [J]. IEEE ACCESS, 2017, 5 : 5235 - 5246