Cross-Corpus Speech Emotion Recognition Based on Deep Domain-Adaptive Convolutional Neural Network

被引:18
|
作者
Liu, Jiateng [1 ]
Zheng, Wenming [1 ]
Zong, Yuan [1 ]
Lu, Cheng [2 ]
Tang, Chuangao [1 ]
机构
[1] Southeast Univ, Minist Educ, Key Lab Child Dev & Learning Sci, Nanjing 210096, Peoples R China
[2] Southeast Univ, Sch Informat Sci & Engn, Nanjing 210096, Peoples R China
基金
中国国家自然科学基金;
关键词
cross-corpus speech emotion recognition; deep convolutional neural network; domain adaptation;
D O I
10.1587/transinf.2019EDL8136
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this letter, we propose a novel deep domain-adaptive convolutional neural network (DDACNN) model to handle the challenging cross-corpus speech emotion recognition (SER) problem. The framework of the DDACNN model consists of two components: a feature extraction model based on a deep convolutional neural network (DCNN) and a domain-adaptive (DA) layer added in the DCNN utilizing the maximum mean discrepancy (MMD) criterion. We use labeled spectrograms from source speech corpus combined with unlabeled spectrograms from target speech corpus as the input of two classic DCNNs to extract the emotional features of speech, and train the model with a special mixed loss combined with a cross-entrophy loss and an MMD loss. Compared to other classic cross-corpus SER methods, the major advantage of the DDACNN model is that it can extract robust speech features which are time-frequency related by spectrograms and narrow the discrepancies between feature distribution of source corpus and target corpus to get better cross-corpus performance. Through several cross-corpus SER experiments, our DDACNN achieved the state-of-the-art performance on three public emotion speech corpora and is proved to handle the cross-corpus SER problem efficiently.
引用
收藏
页码:459 / 463
页数:5
相关论文
共 50 条
  • [11] Improved Cross-Corpus Speech Emotion Recognition Using Deep Local Domain Adaptation
    Zhao Huijuan
    Ye Ning
    Wang Ruchuan
    CHINESE JOURNAL OF ELECTRONICS, 2023, 32 (03) : 640 - 646
  • [12] Adversarial Domain Generalized Transformer for Cross-Corpus Speech Emotion Recognition
    Gao, Yuan
    Wang, Longbiao
    Liu, Jiaxing
    Dang, Jianwu
    Okada, Shogo
    IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2024, 15 (02) : 697 - 708
  • [13] A CROSS-CORPUS STUDY ON SPEECH EMOTION RECOGNITION
    Milner, Rosanna
    Jalal, Md Asif
    Ng, Raymond W. M.
    Hain, Thomas
    2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 304 - 311
  • [14] Deep Cross-Corpus Speech Emotion Recognition: Recent Advances and Perspectives
    Zhang, Shiqing
    Liu, Ruixin
    Tao, Xin
    Zhao, Xiaoming
    FRONTIERS IN NEUROROBOTICS, 2021, 15
  • [15] Cross-corpus speech emotion recognition using semi-supervised domain adaptation network
    Zhang, Yumei
    Jia, Maoshen
    Cao, Xuan
    Ru, Jiawei
    Zhang, Xinfeng
    SPEECH COMMUNICATION, 2025, 168
  • [16] Analysis of Deep Learning Architectures for Cross-corpus Speech Emotion Recognition
    Parry, Jack
    Palaz, Dimitri
    Clarke, Georgia
    Lecomte, Pauline
    Mead, Rebecca
    Berger, Michael
    Hofer, Gregor
    INTERSPEECH 2019, 2019, : 1656 - 1660
  • [17] Progressively Discriminative Transfer Network for Cross-Corpus Speech Emotion Recognition
    Lu, Cheng
    Tang, Chuangao
    Zhang, Jiacheng
    Zong, Yuan
    ENTROPY, 2022, 24 (08)
  • [18] Towards Domain-Specific Cross-Corpus Speech Emotion Recognition Approach
    Zhao, Yan
    Zong, Yuan
    Lian, Hailun
    Lu, Cheng
    Shi, Jingang
    Zheng, Wenming
    IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, 2024,
  • [19] Cross-Corpus Speech Emotion Recognition Based on Causal Emotion Information Representation
    Fu, Hongliang
    Li, Qianqian
    Tao, Huawei
    Zhu, Chunhua
    Xie, Yue
    Guo, Ruxue
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2024, E107D (08) : 1097 - 1100
  • [20] Cross-corpus speech emotion recognition using subspace learning and domain adaption
    Xuan Cao
    Maoshen Jia
    Jiawei Ru
    Tun-wen Pai
    EURASIP Journal on Audio, Speech, and Music Processing, 2022