ASERNet: Automatic speech emotion recognition system using MFCC-based LPC approach with deep learning CNN

被引：8

作者：

Jagadeeshwar, Kalyanapu ^{[1
]}

Sreenivasarao, T. ^{[2
]}

Pulicherla, Padmaja ^{[3
]}

Satyanarayana, K. N. V. ^{[4
]}

Lakshmi, K. Mohana ^{[5
]}

Kumar, Pala Mahesh ^{[6
]}

机构：

[1] VIT AP Univ, Dept Comp Sci & Engn, Amaravati, Andhra Pradesh, India

[2] Seshadri Rao Gudlavalleru Engn Coll, Dept Comp Sci & Engn, Gudlavalleru, Andhra Pradesh, India

[3] Teegala Krishna Reddy Engn Coll, Dept Comp Sci & Engn, Hyderabad, Telangana, India

[4] Sagi Rama Krishnam Raju Engn Coll, Dept Elect & Commun Engn, Bhimavaram, Andhra Pradesh, India

[5] CMR Tech Campus, Dept Elect & Commun Engn, Hyderabad, Telangana, India

[6] SAK Informat, Dept Artificial Intelligence, Hyderabad, Telangana, India

来源：

INTERNATIONAL JOURNAL OF MODELING SIMULATION AND SCIENTIFIC COMPUTING | 2023年 / 14卷 / 04期

关键词：

Automatic speech emotion recognition; Mel-frequency cepstral coefficients; linear predictive coding; convolutional neural networks; FEATURES;

D O I：

10.1142/S1793962323410295

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

Automatic speech emotion recognition (ASER) from source speech signals is quite a challenging task since the recognition accuracy is highly dependent on extracted features of speech that are utilized for the classification of speech emotion. In addition, pre-processing and classification phases also play a key role in improving the accuracy of ASER system. Therefore, this paper proposes a deep learning convolutional neural network (DLCNN)-based ASER model, hereafter denoted with ASERNet. In addition, the speech denoising is employed with spectral subtraction (SS) and the extraction of deep features is done using integration of linear predictive coding (LPC) with Mel-frequency Cepstrum coefficients (MFCCs). Finally, DLCNN is employed to classify the emotion of speech from extracted deep features using LPC-MFCC. The simulation results demonstrate the superior performance of the proposed ASERNet model in terms of quality metrics such as accuracy, precision, recall, and F1-score, respectively, compared to state-of-the-art ASER approaches.

引用

页数：22

共 22 条

[1] Speech emotion recognition: Emotional models, databases, features, preprocessing methods, supporting modalities, and classifiers [J].

Akcay, Mehmet Berkehan ;

Oguz, Kaya .

SPEECH COMMUNICATION, 2020, 116 :56-76

[2] CycleGAN-based Emotion Style Transfer as Data Augmentation for Speech Emotion Recognition [J].

Bao, Fang ;

Neumann, Michael ;

Ngoc Thang Vu .

INTERSPEECH 2019, 2019, :2828-2832

[3] Cryptanalysis of a video encryption method based on mixing and permutation operations in the DCT domain [J].

Benrhouma, Oussama ;

Hermassi, Houcemeddine ;

Abd El-Latif, Ahmed A. ;

Belghith, Safya .

SIGNAL IMAGE AND VIDEO PROCESSING, 2015, 9 (06) :1281-1286

[4] Two-layer fuzzy multiple random forest for speech emotion recognition in human-robot interaction [J].

Chen, Luefeng ;

Su, Wanjuan ;

Feng, Yu ;

Wu, Min ;

She, Jinhua ;

Hirota, Kaoru .

INFORMATION SCIENCES, 2020, 509 :150-163

[5] Multimodal speech emotion recognition and classification using convolutional neural network techniques [J].

Christy, A. ;

Vaithyasubramanian, S. ;

Jesudoss, A. ;

Praveena, M. D. Anto .

INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2020, 23 (02) :381-388

[6] Text-dependent and text-independent speaker recognition of reverberant speech based on CNN [J].

El-Moneim, Samia Abd ;

Sedik, Ahmed ;

Nassar, M. A. ;

El-Fishawy, Adel S. ;

Sharshar, A. M. ;

Hassan, Shaimaa E. A. ;

Mahmoud, Adel Zaghloul ;

Dessouky, Moawd I. ;

El-Banby, Ghada M. ;

El-Samie, Fathi E. Abd ;

El-Rabaie, El-Sayed M. ;

Neyazi, Badawi ;

Seddeq, H. S. ;

Ismail, Nabil A. ;

Khalaf, Ashraf A. M. ;

Elabyad, G. S. M. .

INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2021, 24 (04) :993-1006

[7] Optimal feature selection for speech emotion recognition using enhanced cat swarm optimization algorithm [J].

Gomathy, M. .

INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2021, 24 (01) :155-163

[8] Pattern recognition and features selection for speech emotion recognition model using deep learning [J].

Jermsittiparsert, Kittisak ;

Abdurrahman, Abdurrahman ;

Siriattakul, Parinya ;

Sundeeva, Ludmila A. ;

Hashim, Wahidah ;

Rahim, Robbi ;

Maseleno, Andino .

INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2020, 23 (04) :799-806

[9] Parallelized Convolutional Recurrent Neural Network With Spectral Features for Speech Emotion Recognition [J].

Jiang, Pengxu ;

Fu, Hongliang ;

Tao, Huawei ;

Lei, Peizhi ;

Zhao, Li .

IEEE ACCESS, 2019, 7 :90368-90377

[10] Speech Emotion Recognition Using Deep Learning Techniques: A Review [J].

Khalil, Ruhul Amin ;

Jones, Edward ;

Babar, Mohammad Inayatullah ;

Jan, Tariqullah ;

Zafar, Mohammad Haseeb ;

Alhussain, Thamer .

IEEE ACCESS, 2019, 7 :117327-117345

← 1 2 3 →