Speech emotion recognition based on an improved brain emotion learning model

被引:119
作者
Liu, Zhen-Tao [1 ,2 ]
Xie, Qiao [1 ,2 ]
Wu, Min [1 ,2 ]
Cao, Wei-Hua [1 ,2 ]
Mei, Ying [3 ,4 ]
Mao, Jun-Wei [1 ,2 ]
机构
[1] China Univ Geosci, Sch Automat, Wuhan 430074, Hubei, Peoples R China
[2] Hubei Key Lab Adv Control & Intelligent Automat C, Wuhan 430074, Hubei, Peoples R China
[3] Hunan Univ Arts & Sci, Sch Elect & Informat Engn, Changde 415000, Peoples R China
[4] Cent South Univ, Sch Informat Sci & Engn, Changsha 410083, Hunan, Peoples R China
基金
中国国家自然科学基金;
关键词
Speech; Emotion recognition; Brain-inspired; Brain emotion learning; Genetic algorithm; SPECTRAL FEATURES; NETWORKS;
D O I
10.1016/j.neucom.2018.05.005
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Human-robot emotional interaction has developed rapidly in recent years, in which speech emotion recognition plays a significant role. In this paper, a speech emotion recognition method based on an improved brain emotional learning (BEL) model is proposed, which is inspired by the emotional processing mechanism of the limbic system in the brain. The reinforcement learning rule of BEL model, however, makes it have poor adaptation and affects its performance. To solve these problems, Genetic Algorithm (GA) is employed to update the weights of BEL model. The proposal is tested on the CASIA Chinese emotion corpus, SAVEE emotion corpus, and FAU Aibo dataset, in which MFCC related features and their 1st order delta coefficients are extracted. In addition, the proposal is tested on INTERSPEECH 2009 standard feature set, in which three dimensionality reduction methods of Linear Discriminant Analysis (LDA), Principal Component Analysis (PCA), and PCA+LDA are used to reduce the dimension of feature set. The experimental results show that the proposed method obtains average recognition accuracy of 90.28% (CASIA), 76.40% (SAVEE), and 71.05% (FAU Aibo) for speaker-dependent (SD) speech emotion recognition and the highest average accuracy of 38.55% (CASIA), 44.18% (SAVEE), 64.60% (FAU Aibo) for speaker-independent (SI) speech emotion recognition are obtained, which shows that the proposal is feasible in speech emotion recognition. (C) 2018 Elsevier B.V. All rights reserved.
引用
收藏
页码:145 / 156
页数:12
相关论文
共 51 条
[1]  
[Anonymous], 2012, THE, DOI DOI 10.1109/IJCNN.2012.6252391
[2]   An optimum algorithm in pathological voice quality assessment using wavelet-packet-based features, linear discriminant analysis and support vector machine [J].
Arjmandi, Meisam Khalil ;
Pooyan, Mohammad .
BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2012, 7 (01) :3-19
[3]   Neo-Fuzzy Supported Brain Emotional Learning Based Pattern Recognizer for Classification Problems [J].
Asad, Muhammad Usman ;
Farooq, Umar ;
Gu, Jason ;
Amin, Javeria ;
Sadaqat, Amna ;
El-Hawary, Mohamed E. ;
Luo, Jun .
IEEE ACCESS, 2017, 5 :6951-6968
[4]   LIBSVM: A Library for Support Vector Machines [J].
Chang, Chih-Chung ;
Lin, Chih-Jen .
ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2011, 2 (03)
[5]  
Edmund T., 1990, THEORY EMOTION ITS A, V4, P161
[6]  
Eyben F., 2010, P 18 ACM INT C MULT, P1459
[7]   On the Correlation and Transferability of Features between Automatic Speech Recognition and Speech Emotion Recognition [J].
Fayek, Haytham M. ;
Lech, Margaret ;
Cavedon, Lawrence .
17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, :3618-3622
[8]   Evaluating deep learning architectures for Speech Emotion Recognition [J].
Fayek, Haytham M. ;
Lech, Margaret ;
Cavedon, Lawrence .
NEURAL NETWORKS, 2017, 92 :60-68
[9]  
Han Wen-Jing, 2014, Journal of Software, V25, P37
[10]   Opinion - The cortical organization of speech processing [J].
Hickok, Gregory ;
Poeppel, David .
NATURE REVIEWS NEUROSCIENCE, 2007, 8 (05) :393-402