Cross-Corpus Speech Emotion Recognition Based on Deep Domain-Adaptive Convolutional Neural Network

被引:18
|
作者
Liu, Jiateng [1 ]
Zheng, Wenming [1 ]
Zong, Yuan [1 ]
Lu, Cheng [2 ]
Tang, Chuangao [1 ]
机构
[1] Southeast Univ, Minist Educ, Key Lab Child Dev & Learning Sci, Nanjing 210096, Peoples R China
[2] Southeast Univ, Sch Informat Sci & Engn, Nanjing 210096, Peoples R China
基金
中国国家自然科学基金;
关键词
cross-corpus speech emotion recognition; deep convolutional neural network; domain adaptation;
D O I
10.1587/transinf.2019EDL8136
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this letter, we propose a novel deep domain-adaptive convolutional neural network (DDACNN) model to handle the challenging cross-corpus speech emotion recognition (SER) problem. The framework of the DDACNN model consists of two components: a feature extraction model based on a deep convolutional neural network (DCNN) and a domain-adaptive (DA) layer added in the DCNN utilizing the maximum mean discrepancy (MMD) criterion. We use labeled spectrograms from source speech corpus combined with unlabeled spectrograms from target speech corpus as the input of two classic DCNNs to extract the emotional features of speech, and train the model with a special mixed loss combined with a cross-entrophy loss and an MMD loss. Compared to other classic cross-corpus SER methods, the major advantage of the DDACNN model is that it can extract robust speech features which are time-frequency related by spectrograms and narrow the discrepancies between feature distribution of source corpus and target corpus to get better cross-corpus performance. Through several cross-corpus SER experiments, our DDACNN achieved the state-of-the-art performance on three public emotion speech corpora and is proved to handle the cross-corpus SER problem efficiently.
引用
收藏
页码:459 / 463
页数:5
相关论文
共 50 条
  • [31] Self Supervised Adversarial Domain Adaptation for Cross-Corpus and Cross-Language Speech Emotion Recognition
    Latif, Siddique
    Rana, Rajib
    Khalifa, Sara
    Jurdak, Raja
    Schuller, Bjorn
    IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2023, 14 (03) : 1912 - 1926
  • [32] Contrastive Learning for Domain Transfer in Cross-Corpus Emotion Recognition
    Yin, Yufeng
    Lu, Liupei
    Xiao, Yao
    Xu, Zhi
    Cai, Kaijie
    Jiang, Haonan
    Gratch, Jonathan
    Soleymani, Mohammad
    2021 9TH INTERNATIONAL CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION (ACII), 2021,
  • [33] Unsupervised Domain Adaptation Integrating Transformers and Mutual Information for Cross-Corpus Speech Emotion Recognition
    Zhang, Shiqing
    Liu, Ruixin
    Yang, Yijiao
    Zhao, Xiaoming
    Yu, Jun
    PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022,
  • [34] Implicitly Aligning Joint Distributions for Cross-Corpus Speech Emotion Recognition
    Lu, Cheng
    Zong, Yuan
    Tang, Chuangao
    Lian, Hailun
    Chang, Hongli
    Zhu, Jie
    Li, Sunan
    Zhao, Yan
    ELECTRONICS, 2022, 11 (17)
  • [35] Synthesized speech for model training in cross-corpus recognition of human emotion
    Schuller, Bjorn
    Zhang, Zixing
    Weninger, Felix
    Burkhardt, Felix
    INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2012, 15 (03) : 313 - 323
  • [36] CROSS-CORPUS EEG-BASED EMOTION RECOGNITION
    Rayatdoost, Soheil
    Soleymani, Mohammad
    2018 IEEE 28TH INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING (MLSP), 2018,
  • [37] Cross-Corpus Speech Emotion Recognition Based on Joint Transfer Subspace Learning and Regression
    Zhang, Weijian
    Song, Peng
    Chen, Dongliang
    Sheng, Chao
    Zhang, Wenjing
    IEEE TRANSACTIONS ON COGNITIVE AND DEVELOPMENTAL SYSTEMS, 2022, 14 (02) : 588 - 598
  • [38] A Cross-Corpus Recognition of Emotional Speech
    Xiao, Zhongzhe
    Wu, Di
    Zhang, Xiaojun
    Tao, Zhi
    PROCEEDINGS OF 2016 9TH INTERNATIONAL SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE AND DESIGN (ISCID), VOL 2, 2016, : 42 - 46
  • [39] Transfer Linear Subspace Learning for Cross-Corpus Speech Emotion Recognition
    Song, Peng
    IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2019, 10 (02) : 265 - 275
  • [40] Synthesized speech for model training in cross-corpus recognition of human emotion
    Björn Schuller
    Zixing Zhang
    Felix Weninger
    Felix Burkhardt
    International Journal of Speech Technology, 2012, 15 (3) : 313 - 323