Cross-Corpus Speech Emotion Recognition Based on Deep Domain-Adaptive Convolutional Neural Network

被引:18
作者
Liu, Jiateng [1 ]
Zheng, Wenming [1 ]
Zong, Yuan [1 ]
Lu, Cheng [2 ]
Tang, Chuangao [1 ]
机构
[1] Southeast Univ, Minist Educ, Key Lab Child Dev & Learning Sci, Nanjing 210096, Peoples R China
[2] Southeast Univ, Sch Informat Sci & Engn, Nanjing 210096, Peoples R China
基金
中国国家自然科学基金;
关键词
cross-corpus speech emotion recognition; deep convolutional neural network; domain adaptation;
D O I
10.1587/transinf.2019EDL8136
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this letter, we propose a novel deep domain-adaptive convolutional neural network (DDACNN) model to handle the challenging cross-corpus speech emotion recognition (SER) problem. The framework of the DDACNN model consists of two components: a feature extraction model based on a deep convolutional neural network (DCNN) and a domain-adaptive (DA) layer added in the DCNN utilizing the maximum mean discrepancy (MMD) criterion. We use labeled spectrograms from source speech corpus combined with unlabeled spectrograms from target speech corpus as the input of two classic DCNNs to extract the emotional features of speech, and train the model with a special mixed loss combined with a cross-entrophy loss and an MMD loss. Compared to other classic cross-corpus SER methods, the major advantage of the DDACNN model is that it can extract robust speech features which are time-frequency related by spectrograms and narrow the discrepancies between feature distribution of source corpus and target corpus to get better cross-corpus performance. Through several cross-corpus SER experiments, our DDACNN achieved the state-of-the-art performance on three public emotion speech corpora and is proved to handle the cross-corpus SER problem efficiently.
引用
收藏
页码:459 / 463
页数:5
相关论文
共 20 条
  • [11] Domain Invariant Transfer Kernel Learning
    Long, Mingsheng
    Wang, Jianmin
    Sun, Jiaguang
    Yu, Philip S.
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2015, 27 (06) : 1519 - 1532
  • [12] Martin O., 2006, 22 INT C DAT ENG WOR, P1, DOI DOI 10.1109/ICDEW.2006.145
  • [13] Domain Adaptation via Transfer Component Analysis
    Pan, Sinno Jialin
    Tsang, Ivor W.
    Kwok, James T.
    Yang, Qiang
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS, 2011, 22 (02): : 199 - 210
  • [14] SPEECH RECOGNITION WITH PRIMARILY TEMPORAL CUES
    SHANNON, RV
    ZENG, FG
    KAMATH, V
    WYGONSKI, J
    EKELID, M
    [J]. SCIENCE, 1995, 270 (5234) : 303 - 304
  • [15] Smola A, 2007, LECT NOTES ARTIF INT, V4754, P13
  • [16] Cross-corpus speech emotion recognition based on transfer non-negative matrix factorization
    Song, Peng
    Zheng, Wenming
    Ou, Shifeng
    Zhang, Xinran
    Jin, Yun
    Liu, Jinglei
    Yu, Yanwei
    [J]. SPEECH COMMUNICATION, 2016, 83 : 34 - 41
  • [17] MPED: A Multi-Model Physiological Emotion Database for Discrete Emotion Recongnition
    Song, Tengfei
    Zheng, Wenming
    Lu, Cheng
    Zong, Yuan
    Zhang, Xilei
    Cui, Zhen
    [J]. IEEE ACCESS, 2019, 7 : 12177 - 12191
  • [18] Deep and shallow features fusion based on deep convolutional neural network for speech emotion recognition
    Sun L.
    Chen J.
    Xie K.
    Gu T.
    [J]. International Journal of Speech Technology, 2018, 21 (04) : 931 - 940
  • [19] Cross-Domain Facial Expression Recognition Based on Transductive Deep Transfer Learning
    Yan, Keyu
    Zheng, Wenming
    Zhang, Tong
    Zong, Yuan
    Tang, Chuangao
    Lu, Cheng
    Cui, Zhen
    [J]. IEEE ACCESS, 2019, 7 : 108906 - 108915
  • [20] Cross-Corpus Speech Emotion Recognition Based on Domain-Adaptive Least-Squares Regression
    Zong, Yuan
    Zheng, Wenming
    Zhang, Tong
    Huang, Xiaohua
    [J]. IEEE SIGNAL PROCESSING LETTERS, 2016, 23 (05) : 585 - 589