Pattern recognition and features selection for speech emotion recognition model using deep learning

被引：26

作者：

Jermsittiparsert, Kittisak ^{[1
]}

Abdurrahman, Abdurrahman ^{[2
]}

Siriattakul, Parinya ^{[3
]}

Sundeeva, Ludmila A. ^{[4
]}

Hashim, Wahidah ^{[5
]}

Rahim, Robbi ^{[6
]}

Maseleno, Andino ^{[7
]}

机构：

[1] Ton Duc Thang Univ, Ho Chi Minh City, Vietnam

[2] Lampung Univ, Phys Educ Dept, Tanjungkarang, Indonesia

[3] Univ Queensland, Sch Psychol, Brisbane, Qld, Australia

[4] Togliatti State Univ, Tolyatti, Russia

[5] Univ Tenaga Nas, Inst Informat & Comp Energy, Kajang, Malaysia

[6] Sekolah Tinggi Ilmu Manajemen Sukma, Medan, Indonesia

[7] STMIK Pringsewu, Dept Informat Syst, Pringsewu, Lampung, Indonesia

来源：

INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY | 2020年 / 23卷 / 04期

关键词：

Deep learning; Speech; Emotion recognition; Feature extraction; CLASSIFICATION;

D O I：

10.1007/s10772-020-09690-2

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Automatic speaker recognizing models consists of a foundation on building various models of speaker characterization, pattern analyzing and engineering. The effect of classification and feature selection methods for the speech emotion recognition is focused. The process of selecting the exact parameter in arrangement with the classifier is an important part of minimizing the difficulty of system computing. This process becomes essential particularly for the models which undergo deployment in real time scenario. In this paper, a new deep learning speech based recognition model is presented for automatically recognizes the speech words. The superiority of an input source, i.e. speech sound in this state has straight impact on a classifier correctness attaining process. The Berlin database consist around 500 demonstrations to media persons that is both male and female. On the applied dataset, the presented model achieves a maximum accuracy of 94.21%, 83.54%, 83.65% and 78.13% under MFCC, prosodic, LSP and LPC features. The presented model offered better recognition performance over the other methods.

引用

页码：799 / 806

页数：8

共 16 条

[1] Convolutional Recurrent Neural Networks for Polyphonic Sound Event Detection [J].

Cakir, Emre ;

Parascandolo, Giambattista ;

Heittola, Toni ;

Huttunen, Heikki ;

Virtanen, Tuomas .

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2017, 25 (06) :1291-1303

[2]

Cakr E., 2016, IEEE AASP CHALL DET, P1

[3] Xception: Deep Learning with Depthwise Separable Convolutions [J].

Chollet, Francois .

30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :1800-1807

[4] Survey on speech emotion recognition: Features, classification schemes, and databases [J].

El Ayadi, Moataz ;

Kamel, Mohamed S. ;

Karray, Fakhri .

PATTERN RECOGNITION, 2011, 44 (03) :572-587

[5]

Hershey S, 2017, INT CONF ACOUST SPEE, P131, DOI 10.1109/ICASSP.2017.7952132

[6] Emotion recognition from speech using sub-syllabic and pitch synchronous spectral features [J].

Koolagudi, Shashidhar ;

Krothapalli, Sreenivasa .

INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2012, 15 (04) :495-511

[7]

Krishnaraj N, 2019, J REAL TIME IMAGE PR

[8] Online clinical decision support system using optimal deep neural networks [J].

Lakshmanaprabu, S. K. ;

Mohanty, Sachi Nandan ;

Rani, Sheeba S. ;

Krishnamoorthy, Sujatha ;

Uthayakumar, J. ;

Shankar, K. .

APPLIED SOFT COMPUTING, 2019, 81

[9] Optimal deep learning model for classification of lung cancer on CT images [J].

Lakshmanaprabu, S. K. ;

Mohanty, Sachi Nandan ;

Shankar, K. ;

Arunkumar, N. ;

Ramirez, Gustavo .

FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2019, 92 :374-382

[10] Deep learning [J].

LeCun, Yann ;

Bengio, Yoshua ;

Hinton, Geoffrey .

NATURE, 2015, 521 (7553) :436-444

← 1 2 →