Exploring corpus-invariant emotional acoustic feature for cross-corpus speech emotion recognition

被引：0

作者：

Lian, Hailun ^{[1
,2
]}

Lu, Cheng ^{[1
,3
]}

Zhao, Yan ^{[1
,2
]}

Li, Sunan ^{[1
,2
]}

Qi, Tianhua ^{[1
,3
]}

Zong, Yuan ^{[1
,3
]}

机构：

[1] Southeast Univ, Key Lab Child Dev & Learning Sci, Minist Educ, Nanjing 210096, Peoples R China

[2] Southeast Univ, Sch Informat Sci & Engn, Nanjing 210096, Peoples R China

[3] Southeast Univ, Sch Biol Sci & Med Engn, Nanjing 210096, Peoples R China

来源：

EXPERT SYSTEMS WITH APPLICATIONS | 2024年 / 258卷

基金：

中国博士后科学基金;

关键词：

Cross-corpus speech emotion recognition; Corpus-invariant emotional acoustic features; Speech emotion recognition; Transfer subspace learning; ADAPTATION;

D O I：

10.1016/j.eswa.2024.125162

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Unsupervised cross-corpus speech emotion recognition (SER) is the task where the labeled training (source) and unlabeled testing (target) speech come from different corpora. Subspace transfer learning is one of the mainstream technologies for tackling cross-corpus SER challenges, which is achieved by utilizing projection matrices to learn common corpus-invariant feature representations between the source and target corpus. However, these methods mainly focus on mapping modeling from the input low-level descriptor (LLD) feature space to the corpus-invariant emotional feature space, which is an implicit feature mapping process that lacks interpretability for the selected features. This omission leads to an inability to pinpoint which acoustic features possess corpus invariance. To bridge this gap, we first propose a new transfer subspace learning framework with feature selection capabilities, i.e., the Corpus-Invariant Emotional Acoustic Feature Seeker (CAFS). Specifically, the CAFS integrates two core terms into the transfer regression loss function: (1) the emotion preservation term: This term includes emotional regression and the l 2 , 1 norm, which is mainly used to select features and ensure that these features are related to emotions. (2) the corpus invariance preservation term: This item is mainly used to measure the difference in feature distribution between the source and target corpora. Minimizing this term bridges the gap between the source and target domains, ensuring that the chosen acoustic features are corpus-invariant. Subsequently, we conducted extensive cross-corpus SER experiments to explore corpus- invariant emotional acoustic features under various commonly used acoustic feature sets (IS09 and eGeMAPS). Through statistical analysis of the acoustic features sought by the CAFS framework, some acoustic features (e.g., Mel-Frequency Cepstral Coefficients (MFCC) and Formant) reveal their corpus-invariant properties, which could provide insights for feature selection in cross-corpus SER. These findings also lay the groundwork for a solid theoretical and empirical foundation for future research and applications in cross-corpus SER.

引用

页数：11

共 50 条

[1] Learning Corpus-Invariant Discriminant Feature Representations for Speech Emotion Recognition
Song, Peng
Ou, Shifeng
Du, Zhenbin
Guo, Yanyan
Ma, Wenming
Liu, Jinglei
Zheng, Wenming
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2017, E100D (05) : 1136 - 1139
[2] A Cross-Corpus Recognition of Emotional Speech
Xiao, Zhongzhe
Wu, Di
Zhang, Xiaojun
Tao, Zhi
PROCEEDINGS OF 2016 9TH INTERNATIONAL SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE AND DESIGN (ISCID), VOL 2, 2016, : 42 - 46
[3] A CROSS-CORPUS STUDY ON SPEECH EMOTION RECOGNITION
Milner, Rosanna
Jalal, Md Asif
Ng, Raymond W. M.
Hain, Thomas
2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 304 - 311
[4] Cross-Corpus Acoustic Emotion Recognition: Variances and Strategies
Schuller, Bjoern
Vlasenko, Bogdan
Eyben, Florian
Woellmer, Martin
Stuhlsatz, Andre
Wendemuth, Andreas
Rigoll, Gerhard
IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2010, 1 (02) : 119 - 131
[5] A Novel DBN Feature Fusion Model for Cross-Corpus Speech Emotion Recognition
Zou Cairong
Zhang Xinran
Zha Cheng
Zhao Li
JOURNAL OF ELECTRICAL AND COMPUTER ENGINEERING, 2016, 2016
[6] Efficient and effective strategies for cross-corpus acoustic emotion recognition
Kaya, Heysem
Karpov, Alexey A.
NEUROCOMPUTING, 2018, 275 : 1028 - 1034
[7] A STUDY ON CROSS-CORPUS SPEECH EMOTION RECOGNITION AND DATA AUGMENTATION
Braunschweiler, Norbert
Doddipatla, Rama
Keizer, Simon
Stoyanchev, Svetlana
2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 24 - 30
[8] DOMAIN-INVARIANT FEATURE LEARNING FOR CROSS CORPUS SPEECH EMOTION RECOGNITION
Gao, Yuan
Okada, Shogo
Wang, Longbiao
Liu, Jiaxing
Dang, Jianwu
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6427 - 6431
[9] Cross-Corpus Speech Emotion Recognition Based on Causal Emotion Information Representation
Fu, Hongliang
Li, Qianqian
Tao, Huawei
Zhu, Chunhua
Xie, Yue
Guo, Ruxue
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2024, E107D (08) : 1097 - 1100
[10] Implicitly Aligning Joint Distributions for Cross-Corpus Speech Emotion Recognition
Lu, Cheng
Zong, Yuan
Tang, Chuangao
Lian, Hailun
Chang, Hongli
Zhu, Jie
Li, Sunan
Zhao, Yan
ELECTRONICS, 2022, 11 (17)

← 1 2 3 4 5 →