Exploring corpus-invariant emotional acoustic feature for cross-corpus speech emotion recognition

被引:0
|
作者
Lian, Hailun [1 ,2 ]
Lu, Cheng [1 ,3 ]
Zhao, Yan [1 ,2 ]
Li, Sunan [1 ,2 ]
Qi, Tianhua [1 ,3 ]
Zong, Yuan [1 ,3 ]
机构
[1] Southeast Univ, Key Lab Child Dev & Learning Sci, Minist Educ, Nanjing 210096, Peoples R China
[2] Southeast Univ, Sch Informat Sci & Engn, Nanjing 210096, Peoples R China
[3] Southeast Univ, Sch Biol Sci & Med Engn, Nanjing 210096, Peoples R China
基金
中国博士后科学基金;
关键词
Cross-corpus speech emotion recognition; Corpus-invariant emotional acoustic features; Speech emotion recognition; Transfer subspace learning; ADAPTATION;
D O I
10.1016/j.eswa.2024.125162
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Unsupervised cross-corpus speech emotion recognition (SER) is the task where the labeled training (source) and unlabeled testing (target) speech come from different corpora. Subspace transfer learning is one of the mainstream technologies for tackling cross-corpus SER challenges, which is achieved by utilizing projection matrices to learn common corpus-invariant feature representations between the source and target corpus. However, these methods mainly focus on mapping modeling from the input low-level descriptor (LLD) feature space to the corpus-invariant emotional feature space, which is an implicit feature mapping process that lacks interpretability for the selected features. This omission leads to an inability to pinpoint which acoustic features possess corpus invariance. To bridge this gap, we first propose a new transfer subspace learning framework with feature selection capabilities, i.e., the Corpus-Invariant Emotional Acoustic Feature Seeker (CAFS). Specifically, the CAFS integrates two core terms into the transfer regression loss function: (1) the emotion preservation term: This term includes emotional regression and the l 2 , 1 norm, which is mainly used to select features and ensure that these features are related to emotions. (2) the corpus invariance preservation term: This item is mainly used to measure the difference in feature distribution between the source and target corpora. Minimizing this term bridges the gap between the source and target domains, ensuring that the chosen acoustic features are corpus-invariant. Subsequently, we conducted extensive cross-corpus SER experiments to explore corpus- invariant emotional acoustic features under various commonly used acoustic feature sets (IS09 and eGeMAPS). Through statistical analysis of the acoustic features sought by the CAFS framework, some acoustic features (e.g., Mel-Frequency Cepstral Coefficients (MFCC) and Formant) reveal their corpus-invariant properties, which could provide insights for feature selection in cross-corpus SER. These findings also lay the groundwork for a solid theoretical and empirical foundation for future research and applications in cross-corpus SER.
引用
收藏
页数:11
相关论文
共 50 条
  • [1] Learning Corpus-Invariant Discriminant Feature Representations for Speech Emotion Recognition
    Song, Peng
    Ou, Shifeng
    Du, Zhenbin
    Guo, Yanyan
    Ma, Wenming
    Liu, Jinglei
    Zheng, Wenming
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2017, E100D (05) : 1136 - 1139
  • [2] A Cross-Corpus Recognition of Emotional Speech
    Xiao, Zhongzhe
    Wu, Di
    Zhang, Xiaojun
    Tao, Zhi
    PROCEEDINGS OF 2016 9TH INTERNATIONAL SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE AND DESIGN (ISCID), VOL 2, 2016, : 42 - 46
  • [3] A CROSS-CORPUS STUDY ON SPEECH EMOTION RECOGNITION
    Milner, Rosanna
    Jalal, Md Asif
    Ng, Raymond W. M.
    Hain, Thomas
    2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 304 - 311
  • [4] Cross-Corpus Acoustic Emotion Recognition: Variances and Strategies
    Schuller, Bjoern
    Vlasenko, Bogdan
    Eyben, Florian
    Woellmer, Martin
    Stuhlsatz, Andre
    Wendemuth, Andreas
    Rigoll, Gerhard
    IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2010, 1 (02) : 119 - 131
  • [5] A Novel DBN Feature Fusion Model for Cross-Corpus Speech Emotion Recognition
    Zou Cairong
    Zhang Xinran
    Zha Cheng
    Zhao Li
    JOURNAL OF ELECTRICAL AND COMPUTER ENGINEERING, 2016, 2016
  • [6] Efficient and effective strategies for cross-corpus acoustic emotion recognition
    Kaya, Heysem
    Karpov, Alexey A.
    NEUROCOMPUTING, 2018, 275 : 1028 - 1034
  • [7] A STUDY ON CROSS-CORPUS SPEECH EMOTION RECOGNITION AND DATA AUGMENTATION
    Braunschweiler, Norbert
    Doddipatla, Rama
    Keizer, Simon
    Stoyanchev, Svetlana
    2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 24 - 30
  • [8] DOMAIN-INVARIANT FEATURE LEARNING FOR CROSS CORPUS SPEECH EMOTION RECOGNITION
    Gao, Yuan
    Okada, Shogo
    Wang, Longbiao
    Liu, Jiaxing
    Dang, Jianwu
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6427 - 6431
  • [9] Cross-Corpus Speech Emotion Recognition Based on Causal Emotion Information Representation
    Fu, Hongliang
    Li, Qianqian
    Tao, Huawei
    Zhu, Chunhua
    Xie, Yue
    Guo, Ruxue
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2024, E107D (08) : 1097 - 1100
  • [10] Implicitly Aligning Joint Distributions for Cross-Corpus Speech Emotion Recognition
    Lu, Cheng
    Zong, Yuan
    Tang, Chuangao
    Lian, Hailun
    Chang, Hongli
    Zhu, Jie
    Li, Sunan
    Zhao, Yan
    ELECTRONICS, 2022, 11 (17)