Speaker to Emotion: Domain Adaptation for Speech Emotion Recognition with Residual Adapters

被引:0
|
作者
Xi, Yuxuan [1 ]
Li, Pengcheng [1 ]
Song, Yan [1 ]
Jiang, Yiheng [1 ]
Dai, Lirong [1 ]
机构
[1] Univ Sci & Technol China, Natl Engn Lab Speech & Language Informat Proc, Hefei, Peoples R China
基金
中国国家自然科学基金;
关键词
D O I
10.1109/apsipaasc47483.2019.9023339
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Despite considerable recent progress in deep learning methods for speech emotion recognition (SER), performance is severely restricted by the lack of large-scale labeled speech emotion corpora. For instance, it is difficult to employ complex neural network architectures such as ResNet, which accompanied by large-sale corpora like VoxCeleb and NIST SRE, have proven to perform well for the related speaker verification (SV) task. In this paper, a novel domain adaptation method is proposed for the speech emotion recognition (SER) task, which aims to transfer related information from a speaker corpus to an emotion corpus. Specifically, a residual adapter architecture is designed for the SER task where ResNet acts as a universal model for general information extraction. An adapter module then trains limited additional parameters to focus on modeling deviation for the specific SER task. To evaluate the effectiveness of the proposed method, we conduct extensive evaluations on benchmark IEMOCAP and CHEAVD 2.0 corpora. Results show significant improvement, with overall results in each task outperforming or matching state-of-the-art methods.
引用
收藏
页码:513 / 518
页数:6
相关论文
共 50 条
  • [21] Audio-Visual Domain Adaptation Feature Fusion for Speech Emotion Recognition
    Wei, Jie
    Hu, Guanyu
    Yang, Xinyu
    Luu, Anh Tuan
    Dong, Yizhuo
    INTERSPEECH 2022, 2022, : 1988 - 1992
  • [22] DOMAIN AND SPEAKER ADAPTATION FOR CORTANA SPEECH RECOGNITION
    Zhao, Yong
    Li, Jinyu
    Zhang, Shixiong
    Chen, Liping
    Gong, Yifan
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5984 - 5988
  • [23] Improving Speech Emotion Recognition System for a Social Robot with Speaker Recognition
    Juszkiewicz, Lukasz
    2014 19TH INTERNATIONAL CONFERENCE ON METHODS AND MODELS IN AUTOMATION AND ROBOTICS (MMAR), 2014, : 921 - 925
  • [24] Multistage data selection-based unsupervised speaker adaptation for personalized speech emotion recognition
    Kim, Jae-Bok
    Park, Jeong-Sik
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2016, 52 : 126 - 134
  • [25] Speaker-Characterized Emotion Recognition using Online and Iterative Speaker Adaptation
    Jae-Bok Kim
    Jeong-Sik Park
    Yung-Hwan Oh
    Cognitive Computation, 2012, 4 : 398 - 408
  • [26] SPEAKER NORMALIZATION FOR SELF-SUPERVISED SPEECH EMOTION RECOGNITION
    Gat, Itai
    Aronowitz, Hagai
    Zhu, Weizhong
    Morais, Edmilson
    Hoory, Ron
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7342 - 7346
  • [27] Graph Learning Based Speaker Independent Speech Emotion Recognition
    Xu, Xinzhou
    Huang, Chengwei
    Wu, Chen
    Wang, Qingyun
    Zhao, Li
    ADVANCES IN ELECTRICAL AND COMPUTER ENGINEERING, 2014, 14 (02) : 17 - 22
  • [28] Speaker-Characterized Emotion Recognition using Online and Iterative Speaker Adaptation
    Kim, Jae-Bok
    Park, Jeong-Sik
    Oh, Yung-Hwan
    COGNITIVE COMPUTATION, 2012, 4 (04) : 398 - 408
  • [29] Emotion Prompting for Speech Emotion Recognition
    Zhou, Xingfa
    Li, Min
    Yang, Lan
    Sun, Rui
    Wang, Xin
    Zhan, Huayi
    INTERSPEECH 2023, 2023, : 3108 - 3112
  • [30] High-order similarity learning based domain adaptation for speech emotion recognition
    Wang, Hao
    Ji, Yixuan
    Song, Peng
    Liu, Zhaowei
    APPLIED ACOUSTICS, 2025, 231