Exploring corpus-invariant emotional acoustic feature for cross-corpus speech emotion recognition

被引：0

作者：

Lian, Hailun ^{[1
,2
]}

Lu, Cheng ^{[1
,3
]}

Zhao, Yan ^{[1
,2
]}

Li, Sunan ^{[1
,2
]}

Qi, Tianhua ^{[1
,3
]}

Zong, Yuan ^{[1
,3
]}

机构：

[1] Southeast Univ, Key Lab Child Dev & Learning Sci, Minist Educ, Nanjing 210096, Peoples R China

[2] Southeast Univ, Sch Informat Sci & Engn, Nanjing 210096, Peoples R China

[3] Southeast Univ, Sch Biol Sci & Med Engn, Nanjing 210096, Peoples R China

来源：

EXPERT SYSTEMS WITH APPLICATIONS | 2024年 / 258卷

基金：

中国博士后科学基金;

关键词：

Cross-corpus speech emotion recognition; Corpus-invariant emotional acoustic features; Speech emotion recognition; Transfer subspace learning; ADAPTATION;

D O I：

10.1016/j.eswa.2024.125162

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Unsupervised cross-corpus speech emotion recognition (SER) is the task where the labeled training (source) and unlabeled testing (target) speech come from different corpora. Subspace transfer learning is one of the mainstream technologies for tackling cross-corpus SER challenges, which is achieved by utilizing projection matrices to learn common corpus-invariant feature representations between the source and target corpus. However, these methods mainly focus on mapping modeling from the input low-level descriptor (LLD) feature space to the corpus-invariant emotional feature space, which is an implicit feature mapping process that lacks interpretability for the selected features. This omission leads to an inability to pinpoint which acoustic features possess corpus invariance. To bridge this gap, we first propose a new transfer subspace learning framework with feature selection capabilities, i.e., the Corpus-Invariant Emotional Acoustic Feature Seeker (CAFS). Specifically, the CAFS integrates two core terms into the transfer regression loss function: (1) the emotion preservation term: This term includes emotional regression and the l 2 , 1 norm, which is mainly used to select features and ensure that these features are related to emotions. (2) the corpus invariance preservation term: This item is mainly used to measure the difference in feature distribution between the source and target corpora. Minimizing this term bridges the gap between the source and target domains, ensuring that the chosen acoustic features are corpus-invariant. Subsequently, we conducted extensive cross-corpus SER experiments to explore corpus- invariant emotional acoustic features under various commonly used acoustic feature sets (IS09 and eGeMAPS). Through statistical analysis of the acoustic features sought by the CAFS framework, some acoustic features (e.g., Mel-Frequency Cepstral Coefficients (MFCC) and Formant) reveal their corpus-invariant properties, which could provide insights for feature selection in cross-corpus SER. These findings also lay the groundwork for a solid theoretical and empirical foundation for future research and applications in cross-corpus SER.

引用

页数：11

共 50 条

[41] Context Modeling for Cross-Corpus Dimensional Acoustic Emotion Recognition: Challenges and Mixup
Fedotov, Dmitrii
Kaya, Heysem
Karpov, Alexey
SPEECH AND COMPUTER (SPECOM 2018), 2018, 11096 : 155 - 165
[42] Emotion category mapping to emotional space by cross-corpus emotion labeling
Arimoto, Yoshiko
Mori, Hiroki
18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 3276 - 3280
[43] Emo-DNA: Emotion Decoupling and Alignment Learning for Cross-Corpus Speech Emotion Recognition
Ye, Jiaxin
Wei, Yujie
Wen, Xin-Cheng
Ma, Chenglong
Huang, Zhizhong
Liu, Kunhong
Shan, Hongming
PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 5956 - 5965
[44] CROSS-CORPUS EEG-BASED EMOTION RECOGNITION
Rayatdoost, Soheil
Soleymani, Mohammad
2018 IEEE 28TH INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING (MLSP), 2018,
[45] A Comparative Study on Different Labelling Schemes and Cross-Corpus Experiments in Speech Emotion Recognition
Baki, Pinar
Erden, Berna
Oncul, Serkan
29TH IEEE CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS (SIU 2021), 2021,
[46] Cross-Corpus Analysis for Acoustic Recognition of Negative Interactions
Lefter, Iulia
Nefs, Harold T.
Jonker, Catholijn M.
Rothkrantz, Leon J. M.
2015 INTERNATIONAL CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION (ACII), 2015, : 132 - 138
[47] Cross-Corpus Speech Emotion Recognition Based on Joint Transfer Subspace Learning and Regression
Zhang, Weijian
Song, Peng
Chen, Dongliang
Sheng, Chao
Zhang, Wenjing
IEEE TRANSACTIONS ON COGNITIVE AND DEVELOPMENTAL SYSTEMS, 2022, 14 (02) : 588 - 598
[48] Improved Cross-Corpus Speech Emotion Recognition Using Deep Local Domain Adaptation
ZHAO Huijuan
YE Ning
WANG Ruchuan
ChineseJournalofElectronics, 2023, 32 (03) : 640 - 646
[49] Cross-Corpus Multilingual Speech Emotion Recognition: Amharic vs. Other Languages
Retta, Ephrem Afele
Sutcliffe, Richard
Mahmood, Jabar
Berwo, Michael Abebe
Almekhlafi, Eiad
Khan, Sajjad Ahmad
Chaudhry, Shehzad Ashraf
Mhamed, Mustafa
Feng, Jun
APPLIED SCIENCES-BASEL, 2023, 13 (23):
[50] Improved Cross-Corpus Speech Emotion Recognition Using Deep Local Domain Adaptation
Zhao Huijuan
Ye Ning
Wang Ruchuan
CHINESE JOURNAL OF ELECTRONICS, 2023, 32 (03) : 640 - 646

← 1 2 3 4 5 →