Exploring corpus-invariant emotional acoustic feature for cross-corpus speech emotion recognition

被引：0

作者：

Lian, Hailun ^{[1
,2
]}

Lu, Cheng ^{[1
,3
]}

Zhao, Yan ^{[1
,2
]}

Li, Sunan ^{[1
,2
]}

Qi, Tianhua ^{[1
,3
]}

Zong, Yuan ^{[1
,3
]}

机构：

[1] Southeast Univ, Key Lab Child Dev & Learning Sci, Minist Educ, Nanjing 210096, Peoples R China

[2] Southeast Univ, Sch Informat Sci & Engn, Nanjing 210096, Peoples R China

[3] Southeast Univ, Sch Biol Sci & Med Engn, Nanjing 210096, Peoples R China

来源：

EXPERT SYSTEMS WITH APPLICATIONS | 2024年 / 258卷

基金：

中国博士后科学基金;

关键词：

Cross-corpus speech emotion recognition; Corpus-invariant emotional acoustic features; Speech emotion recognition; Transfer subspace learning; ADAPTATION;

D O I：

10.1016/j.eswa.2024.125162

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Unsupervised cross-corpus speech emotion recognition (SER) is the task where the labeled training (source) and unlabeled testing (target) speech come from different corpora. Subspace transfer learning is one of the mainstream technologies for tackling cross-corpus SER challenges, which is achieved by utilizing projection matrices to learn common corpus-invariant feature representations between the source and target corpus. However, these methods mainly focus on mapping modeling from the input low-level descriptor (LLD) feature space to the corpus-invariant emotional feature space, which is an implicit feature mapping process that lacks interpretability for the selected features. This omission leads to an inability to pinpoint which acoustic features possess corpus invariance. To bridge this gap, we first propose a new transfer subspace learning framework with feature selection capabilities, i.e., the Corpus-Invariant Emotional Acoustic Feature Seeker (CAFS). Specifically, the CAFS integrates two core terms into the transfer regression loss function: (1) the emotion preservation term: This term includes emotional regression and the l 2 , 1 norm, which is mainly used to select features and ensure that these features are related to emotions. (2) the corpus invariance preservation term: This item is mainly used to measure the difference in feature distribution between the source and target corpora. Minimizing this term bridges the gap between the source and target domains, ensuring that the chosen acoustic features are corpus-invariant. Subsequently, we conducted extensive cross-corpus SER experiments to explore corpus- invariant emotional acoustic features under various commonly used acoustic feature sets (IS09 and eGeMAPS). Through statistical analysis of the acoustic features sought by the CAFS framework, some acoustic features (e.g., Mel-Frequency Cepstral Coefficients (MFCC) and Formant) reveal their corpus-invariant properties, which could provide insights for feature selection in cross-corpus SER. These findings also lay the groundwork for a solid theoretical and empirical foundation for future research and applications in cross-corpus SER.

引用

页数：11

共 50 条

[21] Transfer Subspace Learning for Unsupervised Cross-Corpus Speech Emotion Recognition
Liu, Na
Zhang, Baofeng
Liu, Bin
Shi, Jingang
Yang, Lei
Li, Zhiwei
Zhu, Junchao
IEEE ACCESS, 2021, 9 : 95925 - 95937
[22] Cross-Corpus Acoustic Emotion Recognition: Variances and Strategies (Extended Abstract)
Schuller, Bjoern
Vlasenko, Bogdan
Eyben, Florian
Woellmer, Martin
Stuhlsatz, Andre
Wendemuth, Andreas
Rigoll, Gerhard
2015 INTERNATIONAL CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION (ACII), 2015, : 470 - 476
[23] Filter-based multi-task cross-corpus feature learning for speech emotion recognition
Bakhtiari, Behzad
Kalhor, Elham
Ghafarian, Seyed Hossein
SIGNAL IMAGE AND VIDEO PROCESSING, 2024, 18 (04) : 3145 - 3153
[24] Filter-based multi-task cross-corpus feature learning for speech emotion recognition
Behzad Bakhtiari
Elham Kalhor
Seyed Hossein Ghafarian
Signal, Image and Video Processing, 2024, 18 : 3145 - 3153
[25] Cross-Corpus Speech Emotion Recognition Based on Sparse Subspace Transfer Learning
Zhao, Keke
Song, Peng
Zhang, Wenjing
Zhang, Weijian
Li, Shaokai
Chen, Dongliang
Zheng, Wenming
BIOMETRIC RECOGNITION (CCBR 2021), 2021, 12878 : 466 - 473
[26] Target-Adapted Subspace Learning for Cross-Corpus Speech Emotion Recognition
Chen, Xiuzhen
Zhou, Xiaoyan
Lu, Cheng
Zong, Yuan
Zheng, Wenming
Tang, Chuangao
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2019, E102D (12) : 2632 - 2636
[27] An adaptation framework with unified embedding reconstruction for cross-corpus speech emotion recognition
Zhang, Ruiteng
Wei, Jianguo
Lu, Xugang
Li, Yongwei
Lu, Wenhuan
Zhang, Lin
Xu, Junhai
APPLIED SOFT COMPUTING, 2025, 174
[28] Cross-corpus speech emotion recognition using subspace learning and domain adaption
Xuan Cao
Maoshen Jia
Jiawei Ru
Tun-wen Pai
EURASIP Journal on Audio, Speech, and Music Processing, 2022
[29] Cross-corpus speech emotion recognition using subspace learning and domain adaption
Cao, Xuan
Jia, Maoshen
Ru, Jiawei
Pai, Tun-wen
EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2022, 2022 (01)
[30] Transfer Sparse Discriminant Subspace Learning for Cross-Corpus Speech Emotion Recognition
Zhang, Weijian
Song, Peng
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2020, 28 (28) : 307 - 318

← 1 2 3 4 5 →