Emotion Prompting for Speech Emotion Recognition

被引:0
作者
Zhou, Xingfa [1 ]
Li, Min [2 ]
Yang, Lan [1 ]
Sun, Rui [3 ]
Wang, Xin [4 ]
Zhan, Huayi [1 ]
机构
[1] Sichuan Changhong Elect Holding Grp Co Ltd, Chengdu, Peoples R China
[2] Xinjiang Univ Finance & Econ, Urumqi, Peoples R China
[3] Leshan Normal Univ, Leshan, Peoples R China
[4] Southwest Petr Univ, Nanchong, Peoples R China
来源
INTERSPEECH 2023 | 2023年
关键词
speech emotion recognition; prompt; entailment task; multi-task learning;
D O I
10.21437/Interspeech.2023-1385
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Speech Emotion Recognition (SER) classifies speech into emotion categories such as: Happy, Angry. Most prior works for SER focused on how to mine compelling features to improve performance. However, these methods ignore the influence of emotional label information on SER. Recent studies have attempted to prompt pre-trained language models and yield good performance for NLP tasks. Nevertheless, few works have attempted to prompt pre-trained speech models (PSM) on speech tasks. In light of these, we propose a simple but effective prompt-based method that prompts PSM for SER. Firstly, we reframe SER as an entailment task. Next, we generate speech prompts and combine them with the raw audio to form the input for PSM. Finally, we build a multi-task learning framework to extract more compelling features by simultaneously performing automatic speech recognition (ASR) and SER. Experiments on the IEMOCAP benchmark show that our method outperforms state-of-the-art baselines on the SER task.
引用
收藏
页码:3108 / 3112
页数:5
相关论文
共 29 条
[1]  
[Anonymous], 2012, INTERSPEECH 2012
[2]  
Baevski A, 2020, ADV NEUR IN, V33
[3]   Speech Emotion Recognition via Generation using an Attention-based Variational Recurrent Neural Network [J].
Baruah, Murchana ;
Banerjee, Bonny .
INTERSPEECH 2022, 2022, :4710-4714
[4]   Class-level spectral features for emotion recognition [J].
Bitouk, Dmitri ;
Verma, Ragini ;
Nenkova, Ani .
SPEECH COMMUNICATION, 2010, 52 (7-8) :613-625
[5]  
Brown T. B., 2020, P NIPS
[6]   IEMOCAP: interactive emotional dyadic motion capture database [J].
Busso, Carlos ;
Bulut, Murtaza ;
Lee, Chi-Chun ;
Kazemzadeh, Abe ;
Mower, Emily ;
Kim, Samuel ;
Chang, Jeannette N. ;
Lee, Sungbok ;
Narayanan, Shrikanth S. .
LANGUAGE RESOURCES AND EVALUATION, 2008, 42 (04) :335-359
[7]   Speech Emotion Recognition with Multi-task Learning [J].
Cai, Xingyu ;
Yuan, Jiahong ;
Zheng, Renjie ;
Huang, Liang ;
Church, Kenneth .
INTERSPEECH 2021, 2021, :4508-4512
[8]  
Carlsson F, 2022, PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), P6837
[9]  
Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171
[10]  
Gao TY, 2021, 59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (ACL-IJCNLP 2021), VOL 1, P3816