Speaker to Emotion: Domain Adaptation for Speech Emotion Recognition with Residual Adapters

被引：0

作者：

Xi, Yuxuan ^{[1
]}

Li, Pengcheng ^{[1
]}

Song, Yan ^{[1
]}

Jiang, Yiheng ^{[1
]}

Dai, Lirong ^{[1
]}

机构：

[1] Univ Sci & Technol China, Natl Engn Lab Speech & Language Informat Proc, Hefei, Peoples R China

来源：

2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC) | 2019年

基金：

中国国家自然科学基金;

关键词：

D O I：

10.1109/apsipaasc47483.2019.9023339

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

Despite considerable recent progress in deep learning methods for speech emotion recognition (SER), performance is severely restricted by the lack of large-scale labeled speech emotion corpora. For instance, it is difficult to employ complex neural network architectures such as ResNet, which accompanied by large-sale corpora like VoxCeleb and NIST SRE, have proven to perform well for the related speaker verification (SV) task. In this paper, a novel domain adaptation method is proposed for the speech emotion recognition (SER) task, which aims to transfer related information from a speaker corpus to an emotion corpus. Specifically, a residual adapter architecture is designed for the SER task where ResNet acts as a universal model for general information extraction. An adapter module then trains limited additional parameters to focus on modeling deviation for the specific SER task. To evaluate the effectiveness of the proposed method, we conduct extensive evaluations on benchmark IEMOCAP and CHEAVD 2.0 corpora. Results show significant improvement, with overall results in each task outperforming or matching state-of-the-art methods.

引用

页码：513 / 518

页数：6

共 50 条

[21] Audio-Visual Domain Adaptation Feature Fusion for Speech Emotion Recognition
Wei, Jie
Hu, Guanyu
Yang, Xinyu
Luu, Anh Tuan
Dong, Yizhuo
INTERSPEECH 2022, 2022, : 1988 - 1992
[22] DOMAIN AND SPEAKER ADAPTATION FOR CORTANA SPEECH RECOGNITION
Zhao, Yong
Li, Jinyu
Zhang, Shixiong
Chen, Liping
Gong, Yifan
2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5984 - 5988
[23] Improving Speech Emotion Recognition System for a Social Robot with Speaker Recognition
Juszkiewicz, Lukasz
2014 19TH INTERNATIONAL CONFERENCE ON METHODS AND MODELS IN AUTOMATION AND ROBOTICS (MMAR), 2014, : 921 - 925
[24] Multistage data selection-based unsupervised speaker adaptation for personalized speech emotion recognition
Kim, Jae-Bok
Park, Jeong-Sik
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2016, 52 : 126 - 134
[25] Speaker-Characterized Emotion Recognition using Online and Iterative Speaker Adaptation
Jae-Bok Kim
Jeong-Sik Park
Yung-Hwan Oh
Cognitive Computation, 2012, 4 : 398 - 408
[26] SPEAKER NORMALIZATION FOR SELF-SUPERVISED SPEECH EMOTION RECOGNITION
Gat, Itai
Aronowitz, Hagai
Zhu, Weizhong
Morais, Edmilson
Hoory, Ron
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7342 - 7346
[27] Graph Learning Based Speaker Independent Speech Emotion Recognition
Xu, Xinzhou
Huang, Chengwei
Wu, Chen
Wang, Qingyun
Zhao, Li
ADVANCES IN ELECTRICAL AND COMPUTER ENGINEERING, 2014, 14 (02) : 17 - 22
[28] Speaker-Characterized Emotion Recognition using Online and Iterative Speaker Adaptation
Kim, Jae-Bok
Park, Jeong-Sik
Oh, Yung-Hwan
COGNITIVE COMPUTATION, 2012, 4 (04) : 398 - 408
[29] Emotion Prompting for Speech Emotion Recognition
Zhou, Xingfa
Li, Min
Yang, Lan
Sun, Rui
Wang, Xin
Zhan, Huayi
INTERSPEECH 2023, 2023, : 3108 - 3112
[30] High-order similarity learning based domain adaptation for speech emotion recognition
Wang, Hao
Ji, Yixuan
Song, Peng
Liu, Zhaowei
APPLIED ACOUSTICS, 2025, 231

← 1 2 3 4 5 →