Enhanced artificial neural network-based SER model in low-resource Indian language

被引:0
作者
Chiradeep Mukherjee [1 ]
Piyash Mondal [1 ]
Kankana Sarkar [1 ]
Suman Paul [1 ]
Akash Saha [1 ]
Arindam Chakraborty [2 ]
机构
[1] Department of CST & CSIT, Institute of Engineering and Management, University of Engineering and Management, Kolkata, Kolkata
[2] Department of ECE, Institute of Engineering and Management, University of Engineering and Management, Kolkata, Kolkata
关键词
Accuracy; Artificial neural network; BanglaSER dataset; Chroma VQT; MFCC; Speech emotion recognition; STFT;
D O I
10.1007/s41870-024-02310-1
中图分类号
学科分类号
摘要
Speech emotion recognition (SER) is an emerging application that helps computers understand human intentions and preferences. However, the task of SER in low-resource Indian languages, such as Bengali, remains challenging due to content variations, dialectical shifts, acoustic variability, and speaker age variations. This study presents an artificial neural network (ANN)-based model for speech emotion recognition in the Bengali language. The model uses spectral features like short-time Fourier transform (STFT), chroma-VQT, Melspectrogram, and mel-frequency cepstral coefficient (MFCC) to understand and recognize the emotions people talk about. Four frequency analysis tools are used to convert time-domain signals into spectral features extracted from the BanglaSER dataset. It is reported that the proposed ANN model with MFCC spectral feature, compiled with Nadam optimizer, yields higher train and test accuracies compared to other spectral feature combinations. The proposed model is the first attempt to employ an ANN-based deep learning model in the Bengali SER task, generating significantly higher accuracies than conventional machine-learning-based models. © Bharati Vidyapeeth's Institute of Computer Applications and Management 2024.
引用
收藏
页码:263 / 277
页数:14
相关论文
共 32 条
[1]  
Madanian S., Chen T., Adeleye O., Templeton J.M., Poellabauer C., Parry D., Schneider S.L., Speech emotion recognition using machine learning—a systematic review, Intell Syst Appl, 20, (2023)
[2]  
Sharanyaa S., Mercy T.J., Emotion Recognition Using Speech Processing, 2023 3Rd International Conference on Intelligent Technologies (CONIT), pp. 1-5, (2023)
[3]  
Office of the Registrar General & Census Commissioner, India, Ministry of Home Affairs, Government of India
[4]  
Chakraborty C., Dash T.K., Panda G., Solanki S.S., Phase-based cepstral features for automatic speech emotion recognition of low resource Indian languages, ACM Trans Asian Low-Resour Lang Inf Process, (2022)
[5]  
Lope J., Grana M., An ongoing review of speech emotion recognition, Neurocomputing, 528, pp. 1-11, (2023)
[6]  
Jayanthi S.K., Mohan B.L., An integrated framework for emotion recognition using speech and static images with deep classifier fusion approach, Int J Inf Tecnol, 14, pp. 3401-3411, (2022)
[7]  
Islam M.R., Akhand M.A.H., Kamal M.A.S., Bangla Speech Emotion Recognition Using 3D CNN Bi-LSTM Model, pp. 539-550, (2023)
[8]  
Mono-to-Stereo Upmixing”, [PDF]
[9]  
Fraunhofer Institute for Integrated Circuits IIS, Erlangen, Germany, (2016)
[10]  
Krishnan S., Biomedical signal analysis for connected healthcare, (2021)