Cross-Corpus Speech Emotion Recognition Based on Deep Domain-Adaptive Convolutional Neural Network

被引：18

作者：

Liu, Jiateng ^{[1
]}

Zheng, Wenming ^{[1
]}

Zong, Yuan ^{[1
]}

Lu, Cheng ^{[2
]}

Tang, Chuangao ^{[1
]}

机构：

[1] Southeast Univ, Minist Educ, Key Lab Child Dev & Learning Sci, Nanjing 210096, Peoples R China

[2] Southeast Univ, Sch Informat Sci & Engn, Nanjing 210096, Peoples R China

来源：

IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS | 2020年 / E103D卷 / 02期

基金：

中国国家自然科学基金;

关键词：

cross-corpus speech emotion recognition; deep convolutional neural network; domain adaptation;

D O I：

10.1587/transinf.2019EDL8136

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

In this letter, we propose a novel deep domain-adaptive convolutional neural network (DDACNN) model to handle the challenging cross-corpus speech emotion recognition (SER) problem. The framework of the DDACNN model consists of two components: a feature extraction model based on a deep convolutional neural network (DCNN) and a domain-adaptive (DA) layer added in the DCNN utilizing the maximum mean discrepancy (MMD) criterion. We use labeled spectrograms from source speech corpus combined with unlabeled spectrograms from target speech corpus as the input of two classic DCNNs to extract the emotional features of speech, and train the model with a special mixed loss combined with a cross-entrophy loss and an MMD loss. Compared to other classic cross-corpus SER methods, the major advantage of the DDACNN model is that it can extract robust speech features which are time-frequency related by spectrograms and narrow the discrepancies between feature distribution of source corpus and target corpus to get better cross-corpus performance. Through several cross-corpus SER experiments, our DDACNN achieved the state-of-the-art performance on three public emotion speech corpora and is proved to handle the cross-corpus SER problem efficiently.

引用

页码：459 / 463

页数：5

共 20 条

[1] [Anonymous], 2005, P INTERSPEECH
[2] [Anonymous], 2009, 10 ANN C INT SPEECH
[3] Badshah AM, 2017, 2017 INTERNATIONAL CONFERENCE ON PLATFORM TECHNOLOGY AND SERVICE (PLATCON), P125
[4] Survey on speech emotion recognition: Features, classification schemes, and databases
El Ayadi, Moataz
Kamel, Mohamed S.
Karray, Fakhri
[J]. PATTERN RECOGNITION, 2011, 44 (03) : 572 - 587
[5] Eyben F, 2010, P 18 ACM INT C MULT
[6] Gretton A., 2007, ADV NEURAL INFORM PR, V19, P513, DOI DOI 10.7551/MITPRESS/7503.003.0069
[7] ImageNet Classification with Deep Convolutional Neural Networks
Krizhevsky, Alex
Sutskever, Ilya
Hinton, Geoffrey E.
[J]. COMMUNICATIONS OF THE ACM, 2017, 60 (06) : 84 - 90
[8] Gradient-based learning applied to document recognition
Lecun, Y
Bottou, L
Bengio, Y
Haffner, P
[J]. PROCEEDINGS OF THE IEEE, 1998, 86 (11) : 2278 - 2324
[9] Li X, 2008, PROCEEDINGS OF THE 3RD INTERNATIONAL CONFERENCE ON PRODUCT INNOVATION MANAGEMENT, VOLS I AND II, P86
[10] Long M., 2017, Proc Mach Learn Res, P2208

← 1 2 →