Exploring corpus-invariant emotional acoustic feature for cross-corpus speech emotion recognition

被引:0
|
作者
Lian, Hailun [1 ,2 ]
Lu, Cheng [1 ,3 ]
Zhao, Yan [1 ,2 ]
Li, Sunan [1 ,2 ]
Qi, Tianhua [1 ,3 ]
Zong, Yuan [1 ,3 ]
机构
[1] Southeast Univ, Key Lab Child Dev & Learning Sci, Minist Educ, Nanjing 210096, Peoples R China
[2] Southeast Univ, Sch Informat Sci & Engn, Nanjing 210096, Peoples R China
[3] Southeast Univ, Sch Biol Sci & Med Engn, Nanjing 210096, Peoples R China
基金
中国博士后科学基金;
关键词
Cross-corpus speech emotion recognition; Corpus-invariant emotional acoustic features; Speech emotion recognition; Transfer subspace learning; ADAPTATION;
D O I
10.1016/j.eswa.2024.125162
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Unsupervised cross-corpus speech emotion recognition (SER) is the task where the labeled training (source) and unlabeled testing (target) speech come from different corpora. Subspace transfer learning is one of the mainstream technologies for tackling cross-corpus SER challenges, which is achieved by utilizing projection matrices to learn common corpus-invariant feature representations between the source and target corpus. However, these methods mainly focus on mapping modeling from the input low-level descriptor (LLD) feature space to the corpus-invariant emotional feature space, which is an implicit feature mapping process that lacks interpretability for the selected features. This omission leads to an inability to pinpoint which acoustic features possess corpus invariance. To bridge this gap, we first propose a new transfer subspace learning framework with feature selection capabilities, i.e., the Corpus-Invariant Emotional Acoustic Feature Seeker (CAFS). Specifically, the CAFS integrates two core terms into the transfer regression loss function: (1) the emotion preservation term: This term includes emotional regression and the l 2 , 1 norm, which is mainly used to select features and ensure that these features are related to emotions. (2) the corpus invariance preservation term: This item is mainly used to measure the difference in feature distribution between the source and target corpora. Minimizing this term bridges the gap between the source and target domains, ensuring that the chosen acoustic features are corpus-invariant. Subsequently, we conducted extensive cross-corpus SER experiments to explore corpus- invariant emotional acoustic features under various commonly used acoustic feature sets (IS09 and eGeMAPS). Through statistical analysis of the acoustic features sought by the CAFS framework, some acoustic features (e.g., Mel-Frequency Cepstral Coefficients (MFCC) and Formant) reveal their corpus-invariant properties, which could provide insights for feature selection in cross-corpus SER. These findings also lay the groundwork for a solid theoretical and empirical foundation for future research and applications in cross-corpus SER.
引用
收藏
页数:11
相关论文
共 50 条
  • [41] Context Modeling for Cross-Corpus Dimensional Acoustic Emotion Recognition: Challenges and Mixup
    Fedotov, Dmitrii
    Kaya, Heysem
    Karpov, Alexey
    SPEECH AND COMPUTER (SPECOM 2018), 2018, 11096 : 155 - 165
  • [42] Emotion category mapping to emotional space by cross-corpus emotion labeling
    Arimoto, Yoshiko
    Mori, Hiroki
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 3276 - 3280
  • [43] Emo-DNA: Emotion Decoupling and Alignment Learning for Cross-Corpus Speech Emotion Recognition
    Ye, Jiaxin
    Wei, Yujie
    Wen, Xin-Cheng
    Ma, Chenglong
    Huang, Zhizhong
    Liu, Kunhong
    Shan, Hongming
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 5956 - 5965
  • [44] CROSS-CORPUS EEG-BASED EMOTION RECOGNITION
    Rayatdoost, Soheil
    Soleymani, Mohammad
    2018 IEEE 28TH INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING (MLSP), 2018,
  • [45] A Comparative Study on Different Labelling Schemes and Cross-Corpus Experiments in Speech Emotion Recognition
    Baki, Pinar
    Erden, Berna
    Oncul, Serkan
    29TH IEEE CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS (SIU 2021), 2021,
  • [46] Cross-Corpus Analysis for Acoustic Recognition of Negative Interactions
    Lefter, Iulia
    Nefs, Harold T.
    Jonker, Catholijn M.
    Rothkrantz, Leon J. M.
    2015 INTERNATIONAL CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION (ACII), 2015, : 132 - 138
  • [47] Cross-Corpus Speech Emotion Recognition Based on Joint Transfer Subspace Learning and Regression
    Zhang, Weijian
    Song, Peng
    Chen, Dongliang
    Sheng, Chao
    Zhang, Wenjing
    IEEE TRANSACTIONS ON COGNITIVE AND DEVELOPMENTAL SYSTEMS, 2022, 14 (02) : 588 - 598
  • [48] Improved Cross-Corpus Speech Emotion Recognition Using Deep Local Domain Adaptation
    ZHAO Huijuan
    YE Ning
    WANG Ruchuan
    ChineseJournalofElectronics, 2023, 32 (03) : 640 - 646
  • [49] Cross-Corpus Multilingual Speech Emotion Recognition: Amharic vs. Other Languages
    Retta, Ephrem Afele
    Sutcliffe, Richard
    Mahmood, Jabar
    Berwo, Michael Abebe
    Almekhlafi, Eiad
    Khan, Sajjad Ahmad
    Chaudhry, Shehzad Ashraf
    Mhamed, Mustafa
    Feng, Jun
    APPLIED SCIENCES-BASEL, 2023, 13 (23):
  • [50] Improved Cross-Corpus Speech Emotion Recognition Using Deep Local Domain Adaptation
    Zhao Huijuan
    Ye Ning
    Wang Ruchuan
    CHINESE JOURNAL OF ELECTRONICS, 2023, 32 (03) : 640 - 646