Cross-corpus speech emotion recognition using semi-supervised transfer non-negative matrix factorization with adaptation regularization

被引：10

作者：

Luo, Hui ^{[1
]}

Han, Jiqing ^{[1
]}

机构：

[1] Harbin Inst Technol, Sch Comp Sci & Technol, Harbin, Peoples R China

来源：

INTERSPEECH 2019 | 2019年

基金：

美国国家科学基金会;

关键词：

cross-corpus; speech emotion recognition; semi-supervised transfer NMF; adaptation regularization;

D O I：

10.21437/Interspeech.2019-2041

中图分类号：

R36 [病理学]; R76 [耳鼻咽喉科学];

学科分类号：

100104 ; 100213 ;

摘要：

This paper focuses on a cross-corpus speech emotion recognition (SER) task, in which there are some mismatches between the training corpus and the testing corpus. Meanwhile, the label information of the training corpus is known, while the label information of the testing corpus is entirely unknown. To alleviate the influence of these mismatches on the recognition system under this setting, we present a non-negative matrix factorization (NMF) based cross-corpus speech emotion recognition method, called semi-supervised adaptation regularized transfer NMF (SATNMF). The core idea of SATNMF is to incorporate the label information of training corpus into NMF, and seek a latent low-rank feature space, in which the marginal and conditional distribution differences between the two corpora can be minimized simultaneously. Specifically, in this induced feature space, the maximum mean discrepancy (MMD) criterion is used to measure the discrepancies of not only two corpora, but also each class within the two corpora. Moreover, to further exploit the knowledge of the marginal distributions, their underlying manifold structure is considered by using the manifold regularization. Experiments on four popular emotional corpora show that the proposed method achieves better recognition accuracies than state-of-the-art methods.

引用

页码：3247 / 3251

页数：5

共 29 条

[1] Altrov R., 2012, INT WORKSH CORP RES
[2] Integrating structured biological data by Kernel Maximum Mean Discrepancy
Borgwardt, Karsten M.
Gretton, Arthur
Rasch, Malte J.
Kriegel, Hans-Peter
Schoelkopf, Bernhard
Smola, Alex J.
[J]. BIOINFORMATICS, 2006, 22 (14) : E49 - E57
[3] Burkhardt F., 2005, INTERSPEECH, V5, P1517, DOI DOI 10.21437/INTERSPEECH.2005-446
[4] Graph Regularized Nonnegative Matrix Factorization for Data Representation
Cai, Deng
He, Xiaofei
Han, Jiawei
Huang, Thomas S.
[J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2011, 33 (08) : 1548 - 1560
[5] Chen Minmin, 2012, P 29 INT C MACH LEAR
[6] ChineseLDC, 2005, CASIA CHIN EM SPEECH
[7] Universum Autoencoder-Based Domain Adaptation for Speech Emotion Recognition
Deng, Jun
Xu, Xinzhou
Zhang, Zixing
Fruhholz, Sascha
Schuller, Bjorn
[J]. IEEE SIGNAL PROCESSING LETTERS, 2017, 24 (04) : 500 - 504
[8] Autoencoder-based Unsupervised Domain Adaptation for Speech Emotion Recognition
Deng, Jun
Zhang, Zixing
Eyben, Florian
Schuller, Bjoern
[J]. IEEE SIGNAL PROCESSING LETTERS, 2014, 21 (09) : 1068 - 1072
[9] Sparse Autoencoder-based Feature Transfer Learning for Speech Emotion Recognition
Deng, Jun
Zhang, Zixing
Marchi, Erik
Schuller, Bjoern
[J]. 2013 HUMAINE ASSOCIATION CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION (ACII), 2013, : 511 - 516
[10] Eyben Florian, 2010, P 18 ACM INT C MULT, P1459

← 1 2 3 →