Cross-corpus speech emotion recognition using semi-supervised transfer non-negative matrix factorization with adaptation regularization

被引:10
作者
Luo, Hui [1 ]
Han, Jiqing [1 ]
机构
[1] Harbin Inst Technol, Sch Comp Sci & Technol, Harbin, Peoples R China
来源
INTERSPEECH 2019 | 2019年
基金
美国国家科学基金会;
关键词
cross-corpus; speech emotion recognition; semi-supervised transfer NMF; adaptation regularization;
D O I
10.21437/Interspeech.2019-2041
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
This paper focuses on a cross-corpus speech emotion recognition (SER) task, in which there are some mismatches between the training corpus and the testing corpus. Meanwhile, the label information of the training corpus is known, while the label information of the testing corpus is entirely unknown. To alleviate the influence of these mismatches on the recognition system under this setting, we present a non-negative matrix factorization (NMF) based cross-corpus speech emotion recognition method, called semi-supervised adaptation regularized transfer NMF (SATNMF). The core idea of SATNMF is to incorporate the label information of training corpus into NMF, and seek a latent low-rank feature space, in which the marginal and conditional distribution differences between the two corpora can be minimized simultaneously. Specifically, in this induced feature space, the maximum mean discrepancy (MMD) criterion is used to measure the discrepancies of not only two corpora, but also each class within the two corpora. Moreover, to further exploit the knowledge of the marginal distributions, their underlying manifold structure is considered by using the manifold regularization. Experiments on four popular emotional corpora show that the proposed method achieves better recognition accuracies than state-of-the-art methods.
引用
收藏
页码:3247 / 3251
页数:5
相关论文
共 29 条
  • [1] Altrov R., 2012, INT WORKSH CORP RES
  • [2] Integrating structured biological data by Kernel Maximum Mean Discrepancy
    Borgwardt, Karsten M.
    Gretton, Arthur
    Rasch, Malte J.
    Kriegel, Hans-Peter
    Schoelkopf, Bernhard
    Smola, Alex J.
    [J]. BIOINFORMATICS, 2006, 22 (14) : E49 - E57
  • [3] Burkhardt F., 2005, INTERSPEECH, V5, P1517, DOI DOI 10.21437/INTERSPEECH.2005-446
  • [4] Graph Regularized Nonnegative Matrix Factorization for Data Representation
    Cai, Deng
    He, Xiaofei
    Han, Jiawei
    Huang, Thomas S.
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2011, 33 (08) : 1548 - 1560
  • [5] Chen Minmin, 2012, P 29 INT C MACH LEAR
  • [6] ChineseLDC, 2005, CASIA CHIN EM SPEECH
  • [7] Universum Autoencoder-Based Domain Adaptation for Speech Emotion Recognition
    Deng, Jun
    Xu, Xinzhou
    Zhang, Zixing
    Fruhholz, Sascha
    Schuller, Bjorn
    [J]. IEEE SIGNAL PROCESSING LETTERS, 2017, 24 (04) : 500 - 504
  • [8] Autoencoder-based Unsupervised Domain Adaptation for Speech Emotion Recognition
    Deng, Jun
    Zhang, Zixing
    Eyben, Florian
    Schuller, Bjoern
    [J]. IEEE SIGNAL PROCESSING LETTERS, 2014, 21 (09) : 1068 - 1072
  • [9] Sparse Autoencoder-based Feature Transfer Learning for Speech Emotion Recognition
    Deng, Jun
    Zhang, Zixing
    Marchi, Erik
    Schuller, Bjoern
    [J]. 2013 HUMAINE ASSOCIATION CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION (ACII), 2013, : 511 - 516
  • [10] Eyben Florian, 2010, P 18 ACM INT C MULT, P1459