Pre-Finetuning for Few-Shot Emotional Speech Recognition

被引：1

作者：

Chen, Maximillian ^{[1
]}

Yu, Zhou ^{[1
]}

机构：

[1] Columbia Univ, New York, NY 10027 USA

来源：

INTERSPEECH 2023 | 2023年

关键词：

emotion recognition; low-resource learning; pre-finetuning; transfer learning; CORPUS;

D O I：

10.21437/Interspeech.2023-136

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Speech models have long been known to overfit individual speakers for many classification tasks. This leads to poor generalization in settings where the speakers are out-of-domain or out-of-distribution, as is common in production environments. We view speaker adaptation as a few-shot learning problem and propose investigating transfer learning approaches inspired by recent success with pre-trained models in natural language tasks. We propose pre-finetuning speech models on difficult tasks to distill knowledge into few-shot downstream classification objectives. We pre-finetune Wav2Vec2.0 on every permutation of four multiclass emotional speech recognition corpora and evaluate our pre-finetuned models through 33,600 few-shot fine-tuning trials on the Emotional Speech Dataset.

引用

页码：3602 / 3606

页数：5

共 50 条

[21] Prompts in Few-Shot Named Entity Recognition
I. S. Rozhkov
N. V. Loukachevitch
Pattern Recognition and Image Analysis, 2023, 33 : 122 - 131
[22] Iris recognition based on few-shot learning
Lei, Songze
Dong, Baihua
Li, Yonggang
Xiao, Feng
Tian, Feng
COMPUTER ANIMATION AND VIRTUAL WORLDS, 2021, 32 (3-4)
[23] A statistical framework for few-shot action recognition
Haddad, Mark
Ghassab, Vahid K.
Najar, Fatma
Bouguila, Nizar
MULTIMEDIA TOOLS AND APPLICATIONS, 2021, 80 (16) : 24303 - 24318
[24] Dataset Bias in Few-Shot Image Recognition
Jiang, Shuqiang
Zhu, Yaohui
Liu, Chenlong
Song, Xinhang
Li, Xiangyang
Min, Weiqing
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (01) : 229 - 246
[25] DGPN: A Dual Graph Prototypical Network for Few-Shot Speech Spoofing Algorithm Recognition
Gel, Zirui
Xu, Xinzhou
Guo, Haiyan
Wang, Tingting
Yang, Zhen
Schuller, Bjorn W.
INTERSPEECH 2024, 2024, : 1125 - 1129
[26] RAG and Few-Shot Prompting in Emotional Text Generation
Vologina, Elizaveta
Matveeva, Anastasiia
Makhnytka, Olesia
Matveev, Yuri
Burambayeva, Nursaule
SPEECH AND COMPUTER, SPECOM 2024, PT II, 2025, 15300 : 43 - 53
[27] Muppet: Massive Multi-task Representations with Pre-Finetuning
Aghajanyan, Armen
Gupta, Anchit
Shrivastava, Akshat
Chen, Xilun
Zettlemoyer, Luke
Gupta, Sonal
2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 5799 - 5811
[28] Direct multimodal few-shot learning of speech and images
Nortje, Leanne
Kamper, Herman
INTERSPEECH 2021, 2021, : 2971 - 2975
[29] Few-Shot Few-Shot Learning and the role of Spatial Attention
Lifchitz, Yann
Avrithis, Yannis
Picard, Sylvaine
2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 2693 - 2700
[30] A Generative Approach to Zero-Shot and Few-Shot Action Recognition
Mishra, Ashish
Verma, Vinay Kumar
Reddy, M. Shiva Krishna
Arulkumar, S.
Rai, Piyush
Mittal, Anurag
2018 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2018), 2018, : 372 - 380

← 1 2 3 4 5 →