Research on lightweight speech emotion recognition in vehicle noisy environment

被引:0
作者
Tang, Chunqiu [1 ]
Qi, Ao [1 ]
Xie, Bin [2 ]
机构
[1] Wuhan Univ Technol, Sch Mech & Elect Engn, Wuhan 430070, Peoples R China
[2] Wuhan Baohua Display Technol Co Ltd, Wuhan, Peoples R China
关键词
PNCC features; speech emotion recognition; vehicle noise; convolutional neural network; lightweight model;
D O I
10.1177/16878132241260585
中图分类号
O414.1 [热力学];
学科分类号
摘要
In order to reduce the incidence of traffic accidents caused by the emotional state of drivers, this study proposes an emotion recognition algorithm based on vehicle noise environment. This algorithm can effectively identify the emotional state of drivers and provide support for further improving their emotions. To address challenges in existing research on speech emotion recognition, such as excessive model parameters, poor generalization, and suboptimal performance in noisy environments, this paper proposes a lightweight network model suitable for small datasets. The model utilizes Power Normalized Cepstral Coefficients (PNCC) as input features, and employs parallel feature extraction layers at different scales. These features are then fed into a feature learning module for in-depth extraction, with the final determination of the driver's emotional state made by the output layer. Experimental results show that the model achieves an accuracy of 96.08% on the EMO-DB speech dataset. Even in simulated in-vehicle noise environments, the model exhibits high accuracy and robustness. Moreover, compared to other lightweight models, it has fewer training parameters and faster processing speed, making it suitable for deployment on edge devices in mobile applications.
引用
收藏
页数:9
相关论文
共 19 条
[1]   LIGHT-SERNET: A LIGHTWEIGHT FULLY CONVOLUTIONAL NEURAL NETWORK FOR SPEECH EMOTION RECOGNITION [J].
Aftab, Arya ;
Morsali, Alireza ;
Ghaemmaghami, Shahrokh ;
Champagne, Benoit .
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, :6912-6916
[2]   Vehicle Interior Sound Classification Based on Local Quintet Magnitude Pattern and Iterative Neighborhood Component Analysis [J].
Akbal, Erhan ;
Tuncer, Turker ;
Dogan, Sengul .
APPLIED ARTIFICIAL INTELLIGENCE, 2022, 36 (01)
[3]   Lightweight Deep Learning Framework for Speech Emotion Recognition [J].
Akinpelu, Samson ;
Viriri, Serestina ;
Adegun, Adekanmi .
IEEE ACCESS, 2023, 11 :77086-77098
[4]   3-D Convolutional Recurrent Neural Networks With Attention Model for Speech Emotion Recognition [J].
Chen, Mingyi ;
He, Xuanji ;
Yang, Jing ;
Zhang, Han .
IEEE SIGNAL PROCESSING LETTERS, 2018, 25 (10) :1440-1444
[5]  
Dai DY, 2019, INT CONF ACOUST SPEE, P7405, DOI [10.1109/ICASSP.2019.8683765, 10.1109/icassp.2019.8683765]
[6]   Efficient Saliency-Based Object Detection in Remote Sensing Images Using Deep Belief Networks [J].
Diao, Wenhui ;
Sun, Xian ;
Zheng, Xinwei ;
Dou, Fangzheng ;
Wang, Hongqi ;
Fu, Kun .
IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2016, 13 (02) :137-141
[7]   Survey on speech emotion recognition: Features, classification schemes, and databases [J].
El Ayadi, Moataz ;
Kamel, Mohamed S. ;
Karray, Fakhri .
PATTERN RECOGNITION, 2011, 44 (03) :572-587
[8]   Power-Normalized Cepstral Coefficients (PNCC) for Robust Speech Recognition [J].
Kim, Chanwoo ;
Stern, Richard M. .
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2016, 24 (07) :1315-1329
[9]  
Liu F., 2023, Internet Things Technol, V13, P36
[10]  
Promod Y., 2018, INTERSPEECH