Research on Chinese Speech Emotion Recognition Based on Deep Neural Network and Acoustic Features

被引:1
作者
Lee, Ming-Che [1 ]
Yeh, Sheng-Cheng [1 ]
Chang, Jia-Wei [2 ]
Chen, Zhen-Yi [1 ]
机构
[1] Ming Chuan Univ, Dept Comp & Commun Engn, Taoyuan 333, Taiwan
[2] Natl Taichung Univ Sci & Technol, Dept Comp Sci & Informat Engn, Taichung 404, Taiwan
关键词
emotion recognition; deep neural network; acoustic features; SIGNALS; MODEL;
D O I
10.3390/s22134744
中图分类号
O65 [分析化学];
学科分类号
070302 ; 081704 ;
摘要
In recent years, the use of Artificial Intelligence for emotion recognition has attracted much attention. The industrial applicability of emotion recognition is quite comprehensive and has good development potential. This research uses voice emotion recognition technology to apply it to Chinese speech emotion recognition. The main purpose of this research is to transform gradually popularized smart home voice assistants or AI system service robots from a touch-sensitive interface to a voice operation. This research proposed a specifically designed Deep Neural Network (DNN) model to develop a Chinese speech emotion recognition system. In this research, 29 acoustic characteristics in acoustic theory are used as the training attributes of the proposed model. This research also proposes a variety of audio adjustment methods to amplify datasets and enhance training accuracy, including waveform adjustment, pitch adjustment, and pre-emphasize. This study achieved an average emotion recognition accuracy of 88.9% in the CASIA Chinese sentiment corpus. The results show that the deep learning model and audio adjustment method proposed in this study can effectively identify the emotions of Chinese short sentences and can be applied to Chinese voice assistants or integrated with other dialogue applications.
引用
收藏
页数:16
相关论文
共 43 条
[1]   Deep Learning Techniques for Speech Emotion Recognition, from Databases to Models [J].
Abbaschian, Babak Joze ;
Sierra-Sosa, Daniel ;
Elmaghraby, Adel .
SENSORS, 2021, 21 (04) :1-27
[2]   Speech emotion recognition: Emotional models, databases, features, preprocessing methods, supporting modalities, and classifiers [J].
Akcay, Mehmet Berkehan ;
Oguz, Kaya .
SPEECH COMMUNICATION, 2020, 116 (116) :56-76
[3]  
Baevski Alexei, 2020, Advances in neural information processing systems
[4]  
Batliner A., 2008, PROGR WORKSH CORP RE, P28
[5]  
Burkhardt F., 2005, P 9 EUR C SPEECH COM, V5, P1517, DOI DOI 10.21437/INTERSPEECH.2005-446
[6]   IEMOCAP: interactive emotional dyadic motion capture database [J].
Busso, Carlos ;
Bulut, Murtaza ;
Lee, Chi-Chun ;
Kazemzadeh, Abe ;
Mower, Emily ;
Kim, Samuel ;
Chang, Jeannette N. ;
Lee, Sungbok ;
Narayanan, Shrikanth S. .
LANGUAGE RESOURCES AND EVALUATION, 2008, 42 (04) :335-359
[7]   On the Relative Importance of Individual Components of Chord Recognition Systems [J].
Cho, Taemin ;
Bello, Juan P. .
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2014, 22 (02) :477-492
[8]  
Chou HC, 2017, INT CONF AFFECT, P292, DOI 10.1109/ACII.2017.8273615
[9]   SUPPORT-VECTOR NETWORKS [J].
CORTES, C ;
VAPNIK, V .
MACHINE LEARNING, 1995, 20 (03) :273-297
[10]   Emotion recognition in human-computer interaction [J].
Cowie, R ;
Douglas-Cowie, E ;
Tsapatsoulis, N ;
Votsis, G ;
Kollias, S ;
Fellenz, W ;
Taylor, JG .
IEEE SIGNAL PROCESSING MAGAZINE, 2001, 18 (01) :32-80