Speech Emotion Recognition Using CNN

被引：219

作者：

Huang, Zhengwei ^{[1
]}

Dong, Ming ^{[2
]}

Mao, Qirong ^{[1
]}

Zhan, Yongzhao ^{[1
]}

机构：

[1] Jiangsu Univ, Sch Comp Sci & Commun Engn, Zhenjiang 212013, Jiangsu, Peoples R China

[2] Wayne State Univ, Dept Comp Sci, Detroit, MI 48202 USA

来源：

PROCEEDINGS OF THE 2014 ACM CONFERENCE ON MULTIMEDIA (MM'14) | 2014年

关键词：

Speech emotion recognition; Salient feature learning;

D O I：

10.1145/2647868.2654984

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

Deep learning systems, such as Convolutional Neural Networks (CNNs), can infer a hierarchical representation of input data that facilitates categorization. In this paper, we propose to learn affect-salient features for Speech Emotion Recognition (SER) using semi-CNN. The training of semi-CNN has two stages. In the first stage, unlabeled samples are used to learn candidate features by contractive convolutional neural network with reconstruction penalization. The candidate features, in the second step, are used as the input to semi-CNN to learn affect-salient, discriminative features using a novel objective function that encourages the feature saliency, orthogonality and discrimination. Our experiment results on benchmark datasets show that our approach leads to stable and robust recognition performance in complex scenes (e.g., with speaker and environment distortion), and outperforms several well-established SER features.

引用

页码：801 / 804

页数：4

共 10 条

[1]

[Anonymous], 2009, AVSP

[2]

[Anonymous], DEEP LEARNING ROBUST

[3]

[Anonymous], 2012, ACM INT C MULTIMEDIA

[4]

Burkhardt F., 2005, INTERSPEECH, V5, P1517, DOI DOI 10.21437/INTERSPEECH.2005-446

[5]

Engberg I., 1996, DOCUMENTATION DANISH

[6] Speaker independent emotion recognition based on SVM/HMMs fusion system [J].

Fu, Liqin ;

Mao, Xia ;

Chen, Lijiang .

2008 INTERNATIONAL CONFERENCE ON AUDIO, LANGUAGE AND IMAGE PROCESSING, VOLS 1 AND 2, PROCEEDINGS, 2008, :61-65

[7]

Le D., 2013, EMOTION RECOGNITION

[8] Feature Analysis and Evaluation for Automatic Emotion Identification in Speech [J].

Luengo, Iker ;

Navas, Eva ;

Hernaez, Inmaculada .

IEEE TRANSACTIONS ON MULTIMEDIA, 2010, 12 (06) :490-501

[9]

Yu Dong, 2013, ICLR

[10]

ZHANG X, 2012, ACM MULTIMEDIA, P106

← 1 →