Development and Analysis of Convolutional Neural Network based Accurate Speech Emotion Recognition Models

被引:0
作者
Vijayan, Divya M. [1 ]
Arun, A., V [1 ]
Ganeshnath, R. [2 ]
Nath, Ajay S. A. [1 ]
Roy, Rajesh Cherian [3 ]
机构
[1] Model Engn Coll Kochi, Dept Elect, Ernakulam, India
[2] TKM Coll Engn, Dept Elect, Kollam, India
[3] Muthoot Inst Technol & Sci, Dept Comp Sci, Ernakulam, India
来源
2022 IEEE 19TH INDIA COUNCIL INTERNATIONAL CONFERENCE, INDICON | 2022年
关键词
Speech Emotion Recognition; CNN; LSTM; Transformer encoder; Accuracy; RAVDESS dataset; CLASSIFICATION; DEEP;
D O I
10.1109/INDICON56171.2022.10040174
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Automatic speech recognition is a major topic in artificial intelligence and machine learning, with the intent of developing machines that can communicate with humans through speech. Recently, with the emergence of the deep-learning paradigm, end to-end models that extract features and train directly from the raw speech signal have been developed. With the goal of more precisely classifying emotions from speech, this paper presents a comparative analysis on two deep-learning architectures that improves on the models available in the literature in terms of accuracy. Using a combined CNN-LSTM architecture and a CNN-Transformer encoder architecture, this work analyses the complete deep learning strategy for extracting distinct spatial and temporal features and classifying the emotions from speech. Experiments are carried out on the RAVDESS dataset. The CNN-Transformer encoder network achieves high accuracy 82% in these networks, while the CNN-LSTM network achieves 74%.
引用
收藏
页数:6
相关论文
共 18 条
  • [1] Dong LH, 2018, 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), P5884, DOI 10.1109/ICASSP.2018.8462506
  • [2] A Novel Approach for Classification of Speech Emotions Based on Deep and Acoustic Features
    Er, Mehmet Bilal
    [J]. IEEE ACCESS, 2020, 8 : 221640 - 221653
  • [3] Han K, 2014, INTERSPEECH, P223
  • [4] Han W, 2006, IEEE INT SYMP CIRC S, P145
  • [5] Speech emotion recognition with deep convolutional neural networks
    Issa, Dias
    Demirci, M. Fatih
    Yazici, Adnan
    [J]. BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2020, 59
  • [6] Lee J, 2015, 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, P1537
  • [7] Lee KH, 2020, I C INF COMM TECH CO, P1332, DOI 10.1109/ICTC49870.2020.9289227
  • [8] A Study on Speech Emotion Recognition Using a Deep Neural Network
    Lee, Kyong Hee
    Choi, Hyun Kyun
    Jang, Byung Tae
    Kim, Do Hyun
    [J]. 2019 10TH INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION TECHNOLOGY CONVERGENCE (ICTC): ICT CONVERGENCE LEADING THE AUTONOMOUS FUTURE, 2019, : 1162 - 1165
  • [9] CTNet: Conversational Transformer Network for Emotion Recognition
    Lian, Zheng
    Liu, Bin
    Tao, Jianhua
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 : 985 - 1000
  • [10] Learning Salient Features for Speech Emotion Recognition Using Convolutional Neural Networks
    Mao, Qirong
    Dong, Ming
    Huang, Zhengwei
    Zhan, Yongzhao
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2014, 16 (08) : 2203 - 2213