Cross-Corpus Speech Emotion Recognition Based on Few-Shot Learning and Domain Adaptation

被引:32
作者
Ahn, Youngdo [1 ]
Lee, Sung Joo [2 ]
Shin, Jong Won [1 ]
机构
[1] Gwangju Inst Sci & Technol, Sch Elect Engn & Comp Sci, Gwangju 500712, South Korea
[2] Elect & Telecommun Res Inst, 218 Gajeong Ro, Daejeon 34129, South Korea
关键词
Training; Speech recognition; Emotion recognition; Neural networks; Measurement; Feature extraction; Databases; Cross-corpus; cross-lingual; few-shot learning; speech emotion recognition; unsupervised domain adaptation; ADVERSARIAL; ATTENTION;
D O I
10.1109/LSP.2021.3086395
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Within a single speech emotion corpus, deep neural networks have shown decent performance in speech emotion recognition. However, the performance of the emotion recognition based on data-driven learning methods degrades significantly for the cross-corpus scenario. To relieve this issue without any labeled samples from the target domain, we propose a cross-corpus speech emotion recognition based on few-shot learning and unsupervised domain adaptation, which is trained to learn the class (emotion) similarity from the source domain samples adapted to the target domain. In addition, we utilize multiple corpora in training to enhance the robustness of the emotion recognition to the unseen samples. Experiments on emotional speech corpora with three different languages showed that the proposed method outperformed other approaches.
引用
收藏
页码:1190 / 1194
页数:5
相关论文
共 47 条
[1]   Domain Adversarial for Acoustic Emotion Recognition [J].
Abdelwahab, Mohammed ;
Busso, Carlos .
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2018, 26 (12) :2423-2435
[2]   Speech Emotion Recognition based on Multi-Label Emotion Existence Model [J].
Ando, Atsushi ;
Masumura, Ryo ;
Kamiyama, Havana ;
Kobashikawa, Satoshi ;
Aono, Yushi .
INTERSPEECH 2019, 2019, :2818-2822
[3]  
[Anonymous], Conversational Speech Dataset for Sentiment Classification
[4]  
Bao F., 2019, P INT, P2828
[5]   Speaker characteristics and emotion classification [J].
Mustererkennung, Universität Erlangen-Nürnberg, Martensstr. 3, 91058 Erlangen, Germany ;
不详 .
Lect. Notes Comput. Sci., 2007, (138-151) :138-151
[6]  
Burkhardt F., 2005, Interspeech, P1517, DOI 10.21437/interspeech.2005-446
[7]   MSP-IMPROV: An Acted Corpus of Dyadic Interactions to Study Emotion Perception [J].
Busso, Carlos ;
Parthasarathy, Srinivas ;
Burmania, Alec ;
AbdelWahab, Mohammed ;
Sadoughi, Najmeh ;
Provost, Emily Mower .
IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2017, 8 (01) :67-80
[8]   IEMOCAP: interactive emotional dyadic motion capture database [J].
Busso, Carlos ;
Bulut, Murtaza ;
Lee, Chi-Chun ;
Kazemzadeh, Abe ;
Mower, Emily ;
Kim, Samuel ;
Chang, Jeannette N. ;
Lee, Sungbok ;
Narayanan, Shrikanth S. .
LANGUAGE RESOURCES AND EVALUATION, 2008, 42 (04) :335-359
[9]   CREMA-D: Crowd-Sourced Emotional Multimodal Actors Dataset [J].
Cao, Houwei ;
Cooper, David G. ;
Keutmann, Michael K. ;
Gur, Ruben C. ;
Nenkova, Ani ;
Verma, Ragini .
IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2014, 5 (04) :377-390
[10]   Enforcing Semantic Consistency for Cross Corpus Valence Regression from Speech using Adversarial Discrepancy Learning [J].
Chao, Gao-Yi ;
Lin, Yun-Shao ;
Chang, Chun-Min ;
Lee, Chi-Chun .
INTERSPEECH 2019, 2019, :1681-1685