Speaker to Emotion: Domain Adaptation for Speech Emotion Recognition with Residual Adapters

被引：0

作者：

Xi, Yuxuan ^{[1
]}

Li, Pengcheng ^{[1
]}

Song, Yan ^{[1
]}

Jiang, Yiheng ^{[1
]}

Dai, Lirong ^{[1
]}

机构：

[1] Univ Sci & Technol China, Natl Engn Lab Speech & Language Informat Proc, Hefei, Peoples R China

来源：

2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC) | 2019年

基金：

中国国家自然科学基金;

关键词：

D O I：

10.1109/apsipaasc47483.2019.9023339

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

Despite considerable recent progress in deep learning methods for speech emotion recognition (SER), performance is severely restricted by the lack of large-scale labeled speech emotion corpora. For instance, it is difficult to employ complex neural network architectures such as ResNet, which accompanied by large-sale corpora like VoxCeleb and NIST SRE, have proven to perform well for the related speaker verification (SV) task. In this paper, a novel domain adaptation method is proposed for the speech emotion recognition (SER) task, which aims to transfer related information from a speaker corpus to an emotion corpus. Specifically, a residual adapter architecture is designed for the SER task where ResNet acts as a universal model for general information extraction. An adapter module then trains limited additional parameters to focus on modeling deviation for the specific SER task. To evaluate the effectiveness of the proposed method, we conduct extensive evaluations on benchmark IEMOCAP and CHEAVD 2.0 corpora. Results show significant improvement, with overall results in each task outperforming or matching state-of-the-art methods.

引用

页码：513 / 518

页数：6

共 50 条

[31] Speaker Clustering in Emotion Recognition
Ding, Ni
Epps, Julien
13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 1162 - 1165
[32] RobinNet: A Multimodal Speech Emotion Recognition System With Speaker Recognition for Social Interactions
Khurana, Yash
Gupta, Swamita
Sathyaraj, R.
Raja, S. P.
IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, 2022, 11 (01) : 478 - 487
[33] EMOTION CONTROLLABLE SPEECH SYNTHESIS USING EMOTION-UNLABELED DATASET WITH THE ASSISTANCE OF CROSS-DOMAIN SPEECH EMOTION RECOGNITION
Cai, Xiong
Dai, Dongyang
Wu, Zhiyong
Li, Xiang
Li, Jingbei
Meng, Helen
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 5734 - 5738
[34] Speech emotion recognition based on time domain feature
Zhao, Lasheng
Wei, Xiaopeng
Zhang, Qiang
PROCEEDINGS OF THE INTERNATIONAL CONFERENCE INFORMATION COMPUTING AND AUTOMATION, VOLS 1-3, 2008, : 1319 - 1321
[35] Speaker Adversarial Neural Network (SANN) for Speaker-independent Speech Emotion Recognition
Fahad, Md Shah
Ranjan, Ashish
Deepak, Akshay
Pradhan, Gayadhar
CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2022, 41 (11) : 6113 - 6135
[36] Speaker Adversarial Neural Network (SANN) for Speaker-independent Speech Emotion Recognition
Md Shah Fahad
Ashish Ranjan
Akshay Deepak
Gayadhar Pradhan
Circuits, Systems, and Signal Processing, 2022, 41 : 6113 - 6135
[37] Speech Emotion Recognition
Lalitha, S.
Madhavan, Abhishek
Bhushan, Bharath
Saketh, Srinivas
2014 INTERNATIONAL CONFERENCE ON ADVANCES IN ELECTRONICS, COMPUTERS AND COMMUNICATIONS (ICAECC), 2014,
[38] Speech emotion recognition based on emotion perception
Liu, Gang
Cai, Shifang
Wang, Ce
EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2023, 2023 (01)
[39] Autoencoder With Emotion Embedding for Speech Emotion Recognition
Zhang, Chenghao
Xue, Lei
IEEE ACCESS, 2021, 9 : 51231 - 51241
[40] Speaker and gender dependencies in within/cross linguistic Speech Emotion Recognition
Chakhtouna A.
Sekkate S.
Adib A.
International Journal of Speech Technology, 2023, 26 (03) : 609 - 625

← 1 2 3 4 5 →