Speech emotion recognition based on multi-feature speed rate and LSTM

被引:8
作者
Yang, Zijun [1 ]
Li, Zhen [1 ]
Zhou, Shi [2 ]
Zhang, Lifeng [1 ]
Serikawa, Seiichi [1 ]
机构
[1] Kyushu Inst Technol, 1-1 Sensuicho,Tobata Ward, Kitakyushu, Fukuoka 8040011, Japan
[2] Huzhou Univ, 759,East 2nd Rd, Huzhou 313000, Zhejiang, Peoples R China
关键词
Speech emotion recognition; LSTM; Voiced sound; Phonogram; Short-time features; DEPRESSION; SEVERITY; SIGNALS;
D O I
10.1016/j.neucom.2024.128177
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Correctly recognizing speech emotions is of significant importance in various fields, such as healthcare and human-computer interaction (HCI). However, the complexity of speech signal features poses challenges for speech emotion recognition. This study introduces a novel multi-feature method for speech emotion recognition that combines short-and rhythmic features. Utilizing short-time energy, zero-crossing rate, and average amplitude difference, the proposed approach effectively addressed overfitting concerns by reducing feature dimensionality. Employing an (LSTM) network, the experiment achieved notable accuracy across diverse datasets. Specifically, the proposed method achieved an impressive accuracy of up to 98.47% on the CASIA dataset, 100% on the Emo-DB dataset, and 98.87% on the EMOVO dataset, demonstrating its capability to accurately discern speaker emotions across different languages and emotion classes. These findings underscore the significance of incorporating speech rate for emotional content recognition, which holds promise for application in HCI and auxiliary medical diagnostics.
引用
收藏
页数:12
相关论文
共 77 条
[1]   Improved speech emotion recognition with Mel frequency magnitude coefficient [J].
Ancilin, J. ;
Milton, A. .
APPLIED ACOUSTICS, 2021, 179
[2]  
Bachu R, 2008, AM SOC ENG ED ASEE Z, P1
[3]   Voiced/Unvoiced Decision for Speech Signals Based on Zero-Crossing Rate and Energy [J].
Bachu, R. G. ;
Kopparthi, S. ;
Adapa, B. ;
Barkana, B. D. .
ADVANCES TECHNIQUES IN COMPUTING SCIENCES AND SOFTWARE ENGINEERING, 2010, :279-282
[4]   Acoustic profiles in vocal emotion expression [J].
Banse, R ;
Scherer, KR .
JOURNAL OF PERSONALITY AND SOCIAL PSYCHOLOGY, 1996, 70 (03) :614-636
[5]   Low Bit-Rate Speech Coding Through Quantization of Mel-Frequency Cepstral Coefficients [J].
Boucheron, Laura E. ;
De Leon, Phillip L. ;
Sandoval, Steven .
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2012, 20 (02) :610-619
[6]  
Burkhardt F., 2005, Interspeech, P1517
[7]   Voice acoustical measurement of the severity of major depression [J].
Cannizzaro, M ;
Harel, B ;
Reilly, N ;
Chappell, P ;
Snyder, PJ .
BRAIN AND COGNITION, 2004, 56 (01) :30-35
[8]   The Contribution of Sound Intensity in Vocal Emotion Perception: Behavioral and Electrophysiological Evidence [J].
Chen, Xuhai ;
Yang, Jianfeng ;
Gan, Shuzhen ;
Yang, Yufang .
PLOS ONE, 2012, 7 (01)
[9]  
ChineseLDC, 2021, Chinese academy of sciences emotional speech database
[10]  
Costantini G, 2014, LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, P3501