Exploring corpus-invariant emotional acoustic feature for cross-corpus speech emotion recognition

被引:0
作者
Lian, Hailun [1 ,2 ]
Lu, Cheng [1 ,3 ]
Zhao, Yan [1 ,2 ]
Li, Sunan [1 ,2 ]
Qi, Tianhua [1 ,3 ]
Zong, Yuan [1 ,3 ]
机构
[1] Southeast Univ, Key Lab Child Dev & Learning Sci, Minist Educ, Nanjing 210096, Peoples R China
[2] Southeast Univ, Sch Informat Sci & Engn, Nanjing 210096, Peoples R China
[3] Southeast Univ, Sch Biol Sci & Med Engn, Nanjing 210096, Peoples R China
基金
中国博士后科学基金;
关键词
Cross-corpus speech emotion recognition; Corpus-invariant emotional acoustic features; Speech emotion recognition; Transfer subspace learning; ADAPTATION;
D O I
10.1016/j.eswa.2024.125162
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Unsupervised cross-corpus speech emotion recognition (SER) is the task where the labeled training (source) and unlabeled testing (target) speech come from different corpora. Subspace transfer learning is one of the mainstream technologies for tackling cross-corpus SER challenges, which is achieved by utilizing projection matrices to learn common corpus-invariant feature representations between the source and target corpus. However, these methods mainly focus on mapping modeling from the input low-level descriptor (LLD) feature space to the corpus-invariant emotional feature space, which is an implicit feature mapping process that lacks interpretability for the selected features. This omission leads to an inability to pinpoint which acoustic features possess corpus invariance. To bridge this gap, we first propose a new transfer subspace learning framework with feature selection capabilities, i.e., the Corpus-Invariant Emotional Acoustic Feature Seeker (CAFS). Specifically, the CAFS integrates two core terms into the transfer regression loss function: (1) the emotion preservation term: This term includes emotional regression and the l 2 , 1 norm, which is mainly used to select features and ensure that these features are related to emotions. (2) the corpus invariance preservation term: This item is mainly used to measure the difference in feature distribution between the source and target corpora. Minimizing this term bridges the gap between the source and target domains, ensuring that the chosen acoustic features are corpus-invariant. Subsequently, we conducted extensive cross-corpus SER experiments to explore corpus- invariant emotional acoustic features under various commonly used acoustic feature sets (IS09 and eGeMAPS). Through statistical analysis of the acoustic features sought by the CAFS framework, some acoustic features (e.g., Mel-Frequency Cepstral Coefficients (MFCC) and Formant) reveal their corpus-invariant properties, which could provide insights for feature selection in cross-corpus SER. These findings also lay the groundwork for a solid theoretical and empirical foundation for future research and applications in cross-corpus SER.
引用
收藏
页数:11
相关论文
共 50 条
  • [21] Transfer Subspace Learning for Unsupervised Cross-Corpus Speech Emotion Recognition
    Liu, Na
    Zhang, Baofeng
    Liu, Bin
    Shi, Jingang
    Yang, Lei
    Li, Zhiwei
    Zhu, Junchao
    IEEE ACCESS, 2021, 9 : 95925 - 95937
  • [22] Cross-Corpus Acoustic Emotion Recognition: Variances and Strategies (Extended Abstract)
    Schuller, Bjoern
    Vlasenko, Bogdan
    Eyben, Florian
    Woellmer, Martin
    Stuhlsatz, Andre
    Wendemuth, Andreas
    Rigoll, Gerhard
    2015 INTERNATIONAL CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION (ACII), 2015, : 470 - 476
  • [23] Filter-based multi-task cross-corpus feature learning for speech emotion recognition
    Bakhtiari, Behzad
    Kalhor, Elham
    Ghafarian, Seyed Hossein
    SIGNAL IMAGE AND VIDEO PROCESSING, 2024, 18 (04) : 3145 - 3153
  • [24] Filter-based multi-task cross-corpus feature learning for speech emotion recognition
    Behzad Bakhtiari
    Elham Kalhor
    Seyed Hossein Ghafarian
    Signal, Image and Video Processing, 2024, 18 : 3145 - 3153
  • [25] Cross-Corpus Speech Emotion Recognition Based on Sparse Subspace Transfer Learning
    Zhao, Keke
    Song, Peng
    Zhang, Wenjing
    Zhang, Weijian
    Li, Shaokai
    Chen, Dongliang
    Zheng, Wenming
    BIOMETRIC RECOGNITION (CCBR 2021), 2021, 12878 : 466 - 473
  • [26] Target-Adapted Subspace Learning for Cross-Corpus Speech Emotion Recognition
    Chen, Xiuzhen
    Zhou, Xiaoyan
    Lu, Cheng
    Zong, Yuan
    Zheng, Wenming
    Tang, Chuangao
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2019, E102D (12) : 2632 - 2636
  • [27] An adaptation framework with unified embedding reconstruction for cross-corpus speech emotion recognition
    Zhang, Ruiteng
    Wei, Jianguo
    Lu, Xugang
    Li, Yongwei
    Lu, Wenhuan
    Zhang, Lin
    Xu, Junhai
    APPLIED SOFT COMPUTING, 2025, 174
  • [28] Cross-corpus speech emotion recognition using subspace learning and domain adaption
    Xuan Cao
    Maoshen Jia
    Jiawei Ru
    Tun-wen Pai
    EURASIP Journal on Audio, Speech, and Music Processing, 2022
  • [29] Cross-corpus speech emotion recognition using subspace learning and domain adaption
    Cao, Xuan
    Jia, Maoshen
    Ru, Jiawei
    Pai, Tun-wen
    EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2022, 2022 (01)
  • [30] Transfer Sparse Discriminant Subspace Learning for Cross-Corpus Speech Emotion Recognition
    Zhang, Weijian
    Song, Peng
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2020, 28 (28) : 307 - 318