Pre-Finetuning for Few-Shot Emotional Speech Recognition

被引:1
|
作者
Chen, Maximillian [1 ]
Yu, Zhou [1 ]
机构
[1] Columbia Univ, New York, NY 10027 USA
来源
INTERSPEECH 2023 | 2023年
关键词
emotion recognition; low-resource learning; pre-finetuning; transfer learning; CORPUS;
D O I
10.21437/Interspeech.2023-136
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Speech models have long been known to overfit individual speakers for many classification tasks. This leads to poor generalization in settings where the speakers are out-of-domain or out-of-distribution, as is common in production environments. We view speaker adaptation as a few-shot learning problem and propose investigating transfer learning approaches inspired by recent success with pre-trained models in natural language tasks. We propose pre-finetuning speech models on difficult tasks to distill knowledge into few-shot downstream classification objectives. We pre-finetune Wav2Vec2.0 on every permutation of four multiclass emotional speech recognition corpora and evaluate our pre-finetuned models through 33,600 few-shot fine-tuning trials on the Emotional Speech Dataset.
引用
收藏
页码:3602 / 3606
页数:5
相关论文
共 50 条
  • [21] Prompts in Few-Shot Named Entity Recognition
    I. S. Rozhkov
    N. V. Loukachevitch
    Pattern Recognition and Image Analysis, 2023, 33 : 122 - 131
  • [22] Iris recognition based on few-shot learning
    Lei, Songze
    Dong, Baihua
    Li, Yonggang
    Xiao, Feng
    Tian, Feng
    COMPUTER ANIMATION AND VIRTUAL WORLDS, 2021, 32 (3-4)
  • [23] A statistical framework for few-shot action recognition
    Haddad, Mark
    Ghassab, Vahid K.
    Najar, Fatma
    Bouguila, Nizar
    MULTIMEDIA TOOLS AND APPLICATIONS, 2021, 80 (16) : 24303 - 24318
  • [24] Dataset Bias in Few-Shot Image Recognition
    Jiang, Shuqiang
    Zhu, Yaohui
    Liu, Chenlong
    Song, Xinhang
    Li, Xiangyang
    Min, Weiqing
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (01) : 229 - 246
  • [25] DGPN: A Dual Graph Prototypical Network for Few-Shot Speech Spoofing Algorithm Recognition
    Gel, Zirui
    Xu, Xinzhou
    Guo, Haiyan
    Wang, Tingting
    Yang, Zhen
    Schuller, Bjorn W.
    INTERSPEECH 2024, 2024, : 1125 - 1129
  • [26] RAG and Few-Shot Prompting in Emotional Text Generation
    Vologina, Elizaveta
    Matveeva, Anastasiia
    Makhnytka, Olesia
    Matveev, Yuri
    Burambayeva, Nursaule
    SPEECH AND COMPUTER, SPECOM 2024, PT II, 2025, 15300 : 43 - 53
  • [27] Muppet: Massive Multi-task Representations with Pre-Finetuning
    Aghajanyan, Armen
    Gupta, Anchit
    Shrivastava, Akshat
    Chen, Xilun
    Zettlemoyer, Luke
    Gupta, Sonal
    2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 5799 - 5811
  • [28] Direct multimodal few-shot learning of speech and images
    Nortje, Leanne
    Kamper, Herman
    INTERSPEECH 2021, 2021, : 2971 - 2975
  • [29] Few-Shot Few-Shot Learning and the role of Spatial Attention
    Lifchitz, Yann
    Avrithis, Yannis
    Picard, Sylvaine
    2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 2693 - 2700
  • [30] A Generative Approach to Zero-Shot and Few-Shot Action Recognition
    Mishra, Ashish
    Verma, Vinay Kumar
    Reddy, M. Shiva Krishna
    Arulkumar, S.
    Rai, Piyush
    Mittal, Anurag
    2018 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2018), 2018, : 372 - 380