Layer-Adapted Implicit Distribution Alignment Networks for Cross-Corpus Speech Emotion Recognition

被引:0
|
作者
Zhao, Yan [1 ,2 ]
Zong, Yuan [1 ,3 ]
Wang, Jincen [1 ]
Lian, Hailun [1 ,2 ]
Lu, Cheng [1 ]
Zhao, Li [1 ,2 ]
Zheng, Wenming [1 ,3 ]
机构
[1] Southeast Univ, Sch Biol Sci & Med Engn, Key Lab Child Dev & Learning Sci, Minist Educ, Nanjing 210096, Peoples R China
[2] Southeast Univ, Sch Informat Sci & Engn, Nanjing 210096, Peoples R China
[3] Pazhou Lab, Guangzhou 510320, Peoples R China
基金
中国博士后科学基金;
关键词
Cross-corpus speech emotion recognition; deep learning; speech emotion recognition (SER); transfer learning; unsupervised domain adaptation (DA); FEATURES; KERNEL;
D O I
10.1109/TCSS.2024.3362690
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
In this article, we propose a new unsupervised domain adaptation (DA) method called layer-adapted implicit distribution alignment networks (LIDANs) to address the challenge of cross-corpus speech emotion recognition (SER). LIDAN extends our previous ICASSP work, deep implicit distribution alignment networks (DIDANs), whose key contribution lies in the introduction of a novel regularization term called implicit distribution alignment (IDA). This term allows DIDAN trained on source (training) speech samples to remain applicable to predicting emotion labels for target (testing) speech samples, regardless of corpus variance in cross-corpus SER. To further enhance this method, we extend IDA to layer-adapted IDA (LIDA), resulting in LIDAN. This layer-adapted extension consists of three modified IDA terms that consider emotion labels at different levels of granularity. These terms are strategically arranged within different fully connected layers in LIDAN, aligning with the increasing emotion-discriminative abilities with respect to the layer depth. This arrangement enables LIDAN to more effectively learn emotion-discriminative and corpus-invariant features for SER across various corpora compared to DIDAN. It is also worthy to mention that unlike most existing methods that rely on estimating statistical moments to describe preassumed explicit distributions, both IDA and LIDA take a different approach. They utilize an idea of target sample reconstruction to directly bridge the feature distribution gap without making assumptions about their distribution type. As a result, DIDAN and LIDAN can be viewed as implicit cross-corpus SER methods. To evaluate LIDAN, we conducted extensive cross-corpus SER experiments on EmoDB, eNTERFACE, and CASIA corpora. The experimental results demonstrate that LIDAN surpasses recent state-of-theart explicit unsupervised DA methods in tackling cross-corpus SER tasks.
引用
收藏
页码:5419 / 5430
页数:12
相关论文
共 50 条
  • [1] Progressive distribution adapted neural networks for cross-corpus speech emotion recognition
    Zong, Yuan
    Lian, Hailun
    Zhang, Jiacheng
    Feng, Ercui
    Lu, Cheng
    Chang, Hongli
    Tang, Chuangao
    FRONTIERS IN NEUROROBOTICS, 2022, 16
  • [2] Target-Adapted Subspace Learning for Cross-Corpus Speech Emotion Recognition
    Chen, Xiuzhen
    Zhou, Xiaoyan
    Lu, Cheng
    Zong, Yuan
    Zheng, Wenming
    Tang, Chuangao
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2019, E102D (12) : 2632 - 2636
  • [3] A CROSS-CORPUS STUDY ON SPEECH EMOTION RECOGNITION
    Milner, Rosanna
    Jalal, Md Asif
    Ng, Raymond W. M.
    Hain, Thomas
    2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 304 - 311
  • [4] Cross-Corpus Speech Emotion Recognition Based on Hybrid Neural Networks
    Rehman, Abdul
    Liu, Zhen-Tao
    Li, Dan-Yun
    Wu, Bao-Han
    PROCEEDINGS OF THE 39TH CHINESE CONTROL CONFERENCE, 2020, : 7464 - 7468
  • [5] Emo-DNA: Emotion Decoupling and Alignment Learning for Cross-Corpus Speech Emotion Recognition
    Ye, Jiaxin
    Wei, Yujie
    Wen, Xin-Cheng
    Ma, Chenglong
    Huang, Zhizhong
    Liu, Kunhong
    Shan, Hongming
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 5956 - 5965
  • [6] CROSS-CORPUS SPEECH EMOTION RECOGNITION USING JOINT DISTRIBUTION ADAPTIVE REGRESSION
    Zhang, Jiacheng
    Jiang, Lin
    Zong, Yuan
    Zheng, Wenming
    Zhao, Li
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 3790 - 3794
  • [7] A STUDY ON CROSS-CORPUS SPEECH EMOTION RECOGNITION AND DATA AUGMENTATION
    Braunschweiler, Norbert
    Doddipatla, Rama
    Keizer, Simon
    Stoyanchev, Svetlana
    2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 24 - 30
  • [8] Low-rank joint distribution adaptation for cross-corpus speech emotion recognition
    Li, Sunan
    Lu, Cheng
    Zhao, Yan
    Lian, Hailun
    Qi, Tianhua
    Zong, Yuan
    KNOWLEDGE-BASED SYSTEMS, 2025, 315
  • [9] Cross-Corpus Speech Emotion Recognition Based on Causal Emotion Information Representation
    Fu, Hongliang
    Li, Qianqian
    Tao, Huawei
    Zhu, Chunhua
    Xie, Yue
    Guo, Ruxue
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2024, E107D (08) : 1097 - 1100
  • [10] Implicitly Aligning Joint Distributions for Cross-Corpus Speech Emotion Recognition
    Lu, Cheng
    Zong, Yuan
    Tang, Chuangao
    Lian, Hailun
    Chang, Hongli
    Zhu, Jie
    Li, Sunan
    Zhao, Yan
    ELECTRONICS, 2022, 11 (17)