Implementation and Comparison of Speech Emotion Recognition System using Gaussian Mixture Model (GMM) and K- Nearest Neighbor (K-NN) techniques

被引：78

作者：

Lanjewar, Rahul B. ^{[1
]}

Mathurkar, Swarup ^{[2
]}

Patel, Nilesh

机构：

[1] Dr Bahasaheh Ambedkar Coll Engn & Res, Dept Elect, Nagpur 441110, Maharashtra, India

[2] Govt Coll Engn, Dept EXTC, Amravati 444604, Maharashtra, India

来源：

PROCEEDINGS OF 4TH INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATION AND CONTROL(ICAC3'15) | 2015年 / 49卷

关键词：

Speech Features; Emotion; MFCC; wavelet; pitch; K-NN; GMM; Database; FREQUENCY;

D O I：

10.1016/j.procs.2015.04.226

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

The kinship between man and machines has become a new trend of technology such that machines now have to respond by considering the human emotional levels. The signal processing and machine learning technologies have boosted the machine intelligence that it gained the capability to understand human emotions. Incorporating the aspects of speech processing and pattern recognition algorithms an intelligent and emotions specific man-machine interaction can be achieved which can be harnessed to design a smart and secure automated home as well as commercial application. This paper emphasizes on implementation of speech emotion recognition system by utilizing the spectral components of Mel Frequency Cepstrum Coefficients (MFCC), wavelet features of speech and the pitch of vocal traces. The different machine learning algorithms used for the classification are Gaussian Mixture Model (GMM) and K- Nearest Neighbour ( K-NN) models for the recognition of six emotional categories namely happy, angry, neutral, surprised, fearful and sad from the standard speech database Berlin emotion database (BES) followed by the comparison of the two algorithms for performance analysis which is supported by the confusion matrix. (C) 2015 The Authors. Published by Elsevier B.V. This is an open access article under CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).

引用

页码：50 / 57

页数：8

共 11 条

[1] Analysis of Emotionally Salient Aspects of Fundamental Frequency for Emotion Detection [J].

Busso, Carlos ;

Lee, Sungbok ;

Narayanan, Shrikanth .

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2009, 17 (04) :582-596

[2]

Chaudhari D. S., 2013, INT J INNOVATIVE TEC, V2, P68

[3] Mel filter-like admissible wavelet packet structure for speech recognition [J].

Farooq, O ;

Datta, S .

IEEE SIGNAL PROCESSING LETTERS, 2001, 8 (07) :196-198

[4]

Kamel Mohamed S., 2007, IEEE INT C AC SPEECH, P957

[5] Modeling the Temporal Evolution of Acoustic Parameters for Speech Emotion Recognition [J].

Ntalampiras, Stavros ;

Fakotakis, Nikos .

IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2012, 3 (01) :116-125

[6]

Pao Tsang-Long, 2008, SPEECH RECOGNITION T, P550

[7]

Sarikaya Ruhi, 1998, WAVELET PACKET TRANS, P912

[8]

Shashidhar G., 2012, INT J SPEECH TECHNOL, P99

[9]

Sun Xuejing, 1999, PITCH DETERMINATION, P561

[10]

Wu Chang-Hsein, 2011, IEEE T AFFECT COMPUT, V2, P567

← 1 2 →