Development and Analysis of Convolutional Neural Network based Accurate Speech Emotion Recognition Models

被引：0

作者：

Vijayan, Divya M. ^{[1
]}

Arun, A., V ^{[1
]}

Ganeshnath, R. ^{[2
]}

Nath, Ajay S. A. ^{[1
]}

Roy, Rajesh Cherian ^{[3
]}

机构：

[1] Model Engn Coll Kochi, Dept Elect, Ernakulam, India

[2] TKM Coll Engn, Dept Elect, Kollam, India

[3] Muthoot Inst Technol & Sci, Dept Comp Sci, Ernakulam, India

来源：

2022 IEEE 19TH INDIA COUNCIL INTERNATIONAL CONFERENCE, INDICON | 2022年

关键词：

Speech Emotion Recognition; CNN; LSTM; Transformer encoder; Accuracy; RAVDESS dataset; CLASSIFICATION; DEEP;

D O I：

10.1109/INDICON56171.2022.10040174

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

Automatic speech recognition is a major topic in artificial intelligence and machine learning, with the intent of developing machines that can communicate with humans through speech. Recently, with the emergence of the deep-learning paradigm, end to-end models that extract features and train directly from the raw speech signal have been developed. With the goal of more precisely classifying emotions from speech, this paper presents a comparative analysis on two deep-learning architectures that improves on the models available in the literature in terms of accuracy. Using a combined CNN-LSTM architecture and a CNN-Transformer encoder architecture, this work analyses the complete deep learning strategy for extracting distinct spatial and temporal features and classifying the emotions from speech. Experiments are carried out on the RAVDESS dataset. The CNN-Transformer encoder network achieves high accuracy 82% in these networks, while the CNN-LSTM network achieves 74%.

引用

页数：6

共 18 条

[1] Dong LH, 2018, 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), P5884, DOI 10.1109/ICASSP.2018.8462506
[2] A Novel Approach for Classification of Speech Emotions Based on Deep and Acoustic Features
Er, Mehmet Bilal
[J]. IEEE ACCESS, 2020, 8 : 221640 - 221653
[3] Han K, 2014, INTERSPEECH, P223
[4] Han W, 2006, IEEE INT SYMP CIRC S, P145
[5] Speech emotion recognition with deep convolutional neural networks
Issa, Dias
Demirci, M. Fatih
Yazici, Adnan
[J]. BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2020, 59
[6] Lee J, 2015, 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, P1537
[7] Lee KH, 2020, I C INF COMM TECH CO, P1332, DOI 10.1109/ICTC49870.2020.9289227
[8] A Study on Speech Emotion Recognition Using a Deep Neural Network
Lee, Kyong Hee
Choi, Hyun Kyun
Jang, Byung Tae
Kim, Do Hyun
[J]. 2019 10TH INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION TECHNOLOGY CONVERGENCE (ICTC): ICT CONVERGENCE LEADING THE AUTONOMOUS FUTURE, 2019, : 1162 - 1165
[9] CTNet: Conversational Transformer Network for Emotion Recognition
Lian, Zheng
Liu, Bin
Tao, Jianhua
[J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 : 985 - 1000
[10] Learning Salient Features for Speech Emotion Recognition Using Convolutional Neural Networks
Mao, Qirong
Dong, Ming
Huang, Zhengwei
Zhan, Yongzhao
[J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2014, 16 (08) : 2203 - 2213

← 1 2 →