Universum Autoencoder-Based Domain Adaptation for Speech Emotion Recognition

被引：109

作者：

Deng, Jun ^{[1
]}

Xu, Xinzhou ^{[2
]}

Zhang, Zixing ^{[1
]}

Fruhholz, Sascha ^{[3
,4
,5
,6
]}

Schuller, Bjorn ^{[1
,7
]}

机构：

[1] Univ Passau, Chair Complex & Intelligent Syst, D-94032 Passau, Germany

[2] Tech Univ Munich, Machine Intelligence & Signal Proc Grp, D-80333 Munich, Germany

[3] Univ Zurich, Inst Psychol, CH-8006 Zurich, Switzerland

[4] Univ Zurich, Neurosci Ctr Zurich, CH-8092 Zurich, Switzerland

[5] ETH, CH-8092 Zurich, Switzerland

[6] Univ Zurich, Ctr Integrat Human Physiol, CH-8006 Zurich, Switzerland

[7] Imperial Coll London, Dept Comp, London, England

来源：

IEEE SIGNAL PROCESSING LETTERS | 2017年 / 24卷 / 04期

基金：

瑞士国家科学基金会;

关键词：

Deep learning; domain adaptation; speech emotion recognition; universum autoencoders (U-AE);

D O I：

10.1109/LSP.2017.2672753

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

One of the serious obstacles to the applications of speech emotion recognition systems in real-life settings is the lack of generalization of the emotion classifiers. Many recognition systems often present a dramatic drop in performance when tested on speech data obtained from different speakers, acoustic environments, linguistic content, and domain conditions. In this letter, we propose a novel unsupervised domain adaptation model, called Universum autoencoders, to improve the performance of the systems evaluated in mismatched training and test conditions. To address the mismatch, our proposed model not only learns discriminative information from labeled data, but also learns to incorporate the prior knowledge from unlabeled data into the learning. Experimental results on the labeled Geneva Whispered Emotion Corpus database plus other three unlabeled databases demonstrate the effectiveness of the proposed method when compared to other domain adaptation methods.

引用

页码：500 / 504

页数：5

共 27 条

[1]

Abdelwahab M, 2015, INT CONF ACOUST SPEE, P5058, DOI 10.1109/ICASSP.2015.7178934

[2]

Anagnostopoulos C.-N., 2012, ARTIF INTELL REV, V43, P1

[3]

[Anonymous], 1997, P 5 EUROPEAN C SPEEC, DOI DOI 10.21437/EUROSPEECH.1997-494

[4]

[Anonymous], 2013, Proceedings of the 21st ACM International Conference on Multimedia, DOI DOI 10.1145/2502081.2502224

[5]

[Anonymous], 2014, PROC INT C ACOUSTICS

[6]

[Anonymous], P IWSDS SAAR FINL

[7]

Burkhardt F, 2005, EUR C SPEECH COMM TE, DOI DOI 10.21437/INTERSPEECH.2005-446

[8] Emotion recognition in human-computer interaction [J].

Cowie, R ;

Douglas-Cowie, E ;

Tsapatsoulis, N ;

Votsis, G ;

Kollias, S ;

Fellenz, W ;

Taylor, JG .

IEEE SIGNAL PROCESSING MAGAZINE, 2001, 18 (01) :32-80

[9] Autoencoder-based Unsupervised Domain Adaptation for Speech Emotion Recognition [J].

Deng, Jun ;

Zhang, Zixing ;

Eyben, Florian ;

Schuller, Bjoern .

IEEE SIGNAL PROCESSING LETTERS, 2014, 21 (09) :1068-1072

[10] Sparse Autoencoder-based Feature Transfer Learning for Speech Emotion Recognition [J].

Deng, Jun ;

Zhang, Zixing ;

Marchi, Erik ;

Schuller, Bjoern .

2013 HUMAINE ASSOCIATION CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION (ACII), 2013, :511-516

← 1 2 3 →