Common Discriminative Latent Space Learning for Cross-Domain Speech Emotion Recognition

被引:0
作者
Fu, Siqi [1 ]
Song, Peng [1 ]
Wang, Hao [1 ]
Liu, Zhaowei [1 ]
Zheng, Wenming [2 ]
机构
[1] Yantai Univ, Sch Comp & Control Engn, Yantai 264005, Peoples R China
[2] Southeast Univ, Key Lab Child Dev & Learning Sci, Minist Educ, Nanjing 210096, Peoples R China
来源
IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS | 2024年
关键词
Domain adaptation; latent space learning (LSL); regression; speech emotion recognition (SER); LEAST-SQUARES REGRESSION; CLASSIFICATION;
D O I
10.1109/TCSS.2024.3476325
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Cross-domain speech emotion recognition (SER) has received increasing attention in recent years. Existing transfer subspace learning and regression-based SER methods have the following drawbacks. The features in the subspace are still insufficiently representative and discriminative, and direct regression would lead to information loss. To address these problems, we present a novel common discriminative latent space learning (CDLSL) method for cross-domain SER. To be specific, we first obtain a common latent space by imposing a projection matrix on the cross-domain data. Meanwhile, we impose an uncorrelated constraint on the projection matrix to ensure that the features are representative and discriminative after dimension reduction. Then, we implement a graph regularization term on the latent representations of the samples to capture the local similarity information. Furthermore, to obtain a more discriminative common latent space, we introduce the label information by aligning the latent space with the relaxed label space, while mitigating the information loss for regression. Extensive experimental results validate the superiority of the proposed method over the state-of- the-art competitors.
引用
收藏
页数:11
相关论文
共 66 条
[1]   Speech emotion recognition: Emotional models, databases, features, preprocessing methods, supporting modalities, and classifiers [J].
Akcay, Mehmet Berkehan ;
Oguz, Kaya .
SPEECH COMMUNICATION, 2020, 116 (116) :56-76
[2]  
Bo Sun, 2015, 2015 IEEE Power & Energy Society General Meeting, P1, DOI 10.1109/PESGM.2015.7286184
[3]  
Burkhardt F., 2005, Interspeech, V5, P1517, DOI 10.21437/INTERSPEECH.2005-446
[4]   IEMOCAP: interactive emotional dyadic motion capture database [J].
Busso, Carlos ;
Bulut, Murtaza ;
Lee, Chi-Chun ;
Kazemzadeh, Abe ;
Mower, Emily ;
Kim, Samuel ;
Chang, Jeannette N. ;
Lee, Sungbok ;
Narayanan, Shrikanth S. .
LANGUAGE RESOURCES AND EVALUATION, 2008, 42 (04) :335-359
[5]  
Chung F. R. K., 1997, Spectral Graph Theory
[6]   An ongoing review of speech emotion recognition [J].
de Lope, Javier ;
Grana, Manuel .
NEUROCOMPUTING, 2023, 528 :1-11
[7]   A survey of speech emotion recognition in natural environment [J].
Fahad, Md. Shah ;
Ranjan, Ashish ;
Yadav, Jainath ;
Deepak, Akshay .
DIGITAL SIGNAL PROCESSING, 2021, 110
[8]   Unsupervised Visual Domain Adaptation Using Subspace Alignment [J].
Fernando, Basura ;
Habrard, Amaury ;
Sebban, Marc ;
Tuytelaars, Tinne .
2013 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2013, :2960-2967
[9]   Latent space search approach for domain adaptation [J].
Gao, Mingjie ;
Huang, Wei .
EXPERT SYSTEMS WITH APPLICATIONS, 2024, 249
[10]  
George S. M., 2023, Neurocomputing.