Layer-Adapted Implicit Distribution Alignment Networks for Cross-Corpus Speech Emotion Recognition

被引：0

作者：

Zhao, Yan ^{[1
,2
]}

Zong, Yuan ^{[1
,3
]}

Wang, Jincen ^{[1
]}

Lian, Hailun ^{[1
,2
]}

Lu, Cheng ^{[1
]}

Zhao, Li ^{[1
,2
]}

Zheng, Wenming ^{[1
,3
]}

机构：

[1] Southeast Univ, Sch Biol Sci & Med Engn, Key Lab Child Dev & Learning Sci, Minist Educ, Nanjing 210096, Peoples R China

[2] Southeast Univ, Sch Informat Sci & Engn, Nanjing 210096, Peoples R China

[3] Pazhou Lab, Guangzhou 510320, Peoples R China

来源：

IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS | 2024年 / 11卷 / 04期

基金：

中国博士后科学基金;

关键词：

Cross-corpus speech emotion recognition; deep learning; speech emotion recognition (SER); transfer learning; unsupervised domain adaptation (DA); FEATURES; KERNEL;

D O I：

10.1109/TCSS.2024.3362690

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

In this article, we propose a new unsupervised domain adaptation (DA) method called layer-adapted implicit distribution alignment networks (LIDANs) to address the challenge of cross-corpus speech emotion recognition (SER). LIDAN extends our previous ICASSP work, deep implicit distribution alignment networks (DIDANs), whose key contribution lies in the introduction of a novel regularization term called implicit distribution alignment (IDA). This term allows DIDAN trained on source (training) speech samples to remain applicable to predicting emotion labels for target (testing) speech samples, regardless of corpus variance in cross-corpus SER. To further enhance this method, we extend IDA to layer-adapted IDA (LIDA), resulting in LIDAN. This layer-adapted extension consists of three modified IDA terms that consider emotion labels at different levels of granularity. These terms are strategically arranged within different fully connected layers in LIDAN, aligning with the increasing emotion-discriminative abilities with respect to the layer depth. This arrangement enables LIDAN to more effectively learn emotion-discriminative and corpus-invariant features for SER across various corpora compared to DIDAN. It is also worthy to mention that unlike most existing methods that rely on estimating statistical moments to describe preassumed explicit distributions, both IDA and LIDA take a different approach. They utilize an idea of target sample reconstruction to directly bridge the feature distribution gap without making assumptions about their distribution type. As a result, DIDAN and LIDAN can be viewed as implicit cross-corpus SER methods. To evaluate LIDAN, we conducted extensive cross-corpus SER experiments on EmoDB, eNTERFACE, and CASIA corpora. The experimental results demonstrate that LIDAN surpasses recent state-of-theart explicit unsupervised DA methods in tackling cross-corpus SER tasks.

引用

页码：5419 / 5430

页数：12

共 50 条

[1] Progressive distribution adapted neural networks for cross-corpus speech emotion recognition
Zong, Yuan
Lian, Hailun
Zhang, Jiacheng
Feng, Ercui
Lu, Cheng
Chang, Hongli
Tang, Chuangao
FRONTIERS IN NEUROROBOTICS, 2022, 16
[2] Target-Adapted Subspace Learning for Cross-Corpus Speech Emotion Recognition
Chen, Xiuzhen
Zhou, Xiaoyan
Lu, Cheng
Zong, Yuan
Zheng, Wenming
Tang, Chuangao
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2019, E102D (12) : 2632 - 2636
[3] A CROSS-CORPUS STUDY ON SPEECH EMOTION RECOGNITION
Milner, Rosanna
Jalal, Md Asif
Ng, Raymond W. M.
Hain, Thomas
2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 304 - 311
[4] Cross-Corpus Speech Emotion Recognition Based on Hybrid Neural Networks
Rehman, Abdul
Liu, Zhen-Tao
Li, Dan-Yun
Wu, Bao-Han
PROCEEDINGS OF THE 39TH CHINESE CONTROL CONFERENCE, 2020, : 7464 - 7468
[5] Emo-DNA: Emotion Decoupling and Alignment Learning for Cross-Corpus Speech Emotion Recognition
Ye, Jiaxin
Wei, Yujie
Wen, Xin-Cheng
Ma, Chenglong
Huang, Zhizhong
Liu, Kunhong
Shan, Hongming
PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 5956 - 5965
[6] CROSS-CORPUS SPEECH EMOTION RECOGNITION USING JOINT DISTRIBUTION ADAPTIVE REGRESSION
Zhang, Jiacheng
Jiang, Lin
Zong, Yuan
Zheng, Wenming
Zhao, Li
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 3790 - 3794
[7] A STUDY ON CROSS-CORPUS SPEECH EMOTION RECOGNITION AND DATA AUGMENTATION
Braunschweiler, Norbert
Doddipatla, Rama
Keizer, Simon
Stoyanchev, Svetlana
2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 24 - 30
[8] Low-rank joint distribution adaptation for cross-corpus speech emotion recognition
Li, Sunan
Lu, Cheng
Zhao, Yan
Lian, Hailun
Qi, Tianhua
Zong, Yuan
KNOWLEDGE-BASED SYSTEMS, 2025, 315
[9] Cross-Corpus Speech Emotion Recognition Based on Causal Emotion Information Representation
Fu, Hongliang
Li, Qianqian
Tao, Huawei
Zhu, Chunhua
Xie, Yue
Guo, Ruxue
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2024, E107D (08) : 1097 - 1100
[10] Implicitly Aligning Joint Distributions for Cross-Corpus Speech Emotion Recognition
Lu, Cheng
Zong, Yuan
Tang, Chuangao
Lian, Hailun
Chang, Hongli
Zhu, Jie
Li, Sunan
Zhao, Yan
ELECTRONICS, 2022, 11 (17)

← 1 2 3 4 5 →