Speaker to Emotion: Domain Adaptation for Speech Emotion Recognition with Residual Adapters

被引：0

作者：

Xi, Yuxuan ^{[1
]}

Li, Pengcheng ^{[1
]}

Song, Yan ^{[1
]}

Jiang, Yiheng ^{[1
]}

Dai, Lirong ^{[1
]}

机构：

[1] Univ Sci & Technol China, Natl Engn Lab Speech & Language Informat Proc, Hefei, Peoples R China

来源：

2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC) | 2019年

基金：

中国国家自然科学基金;

关键词：

D O I：

10.1109/apsipaasc47483.2019.9023339

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

Despite considerable recent progress in deep learning methods for speech emotion recognition (SER), performance is severely restricted by the lack of large-scale labeled speech emotion corpora. For instance, it is difficult to employ complex neural network architectures such as ResNet, which accompanied by large-sale corpora like VoxCeleb and NIST SRE, have proven to perform well for the related speaker verification (SV) task. In this paper, a novel domain adaptation method is proposed for the speech emotion recognition (SER) task, which aims to transfer related information from a speaker corpus to an emotion corpus. Specifically, a residual adapter architecture is designed for the SER task where ResNet acts as a universal model for general information extraction. An adapter module then trains limited additional parameters to focus on modeling deviation for the specific SER task. To evaluate the effectiveness of the proposed method, we conduct extensive evaluations on benchmark IEMOCAP and CHEAVD 2.0 corpora. Results show significant improvement, with overall results in each task outperforming or matching state-of-the-art methods.

引用

页码：513 / 518

页数：6

共 50 条

[11] Universum Autoencoder-Based Domain Adaptation for Speech Emotion Recognition
Deng, Jun
Xu, Xinzhou
Zhang, Zixing
Fruhholz, Sascha
Schuller, Bjorn
IEEE SIGNAL PROCESSING LETTERS, 2017, 24 (04) : 500 - 504
[12] COUPLED UNSUPERVISED DEEP CONVOLUTIONAL DOMAIN ADAPTATION FOR SPEECH EMOTION RECOGNITION
Noi, Ocquaye Elias Nii
Mao, Qirong
Xu, Guopeng
Xue, Yanfei
2018 IEEE FOURTH INTERNATIONAL CONFERENCE ON MULTIMEDIA BIG DATA (BIGMM), 2018,
[13] Autoencoder-based Unsupervised Domain Adaptation for Speech Emotion Recognition
Deng, Jun
Zhang, Zixing
Eyben, Florian
Schuller, Bjoern
IEEE SIGNAL PROCESSING LETTERS, 2014, 21 (09) : 1068 - 1072
[14] Integrating Emotion Recognition with Speech Recognition and Speaker Diarisation for Conversations
Wu, Wen
Zhang, Chao
Woodland, Philip C.
INTERSPEECH 2023, 2023, : 3607 - 3611
[15] Emotion Attribute Projection for Speaker Recognition on Emotional Speech
Bao, Huanjun
Xu, Mingxing
Zheng, Thomas Fang
INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 601 - 604
[16] Speaker independent speech emotion recognition by ensemble classification
Schuller, B
Reiter, S
Müller, R
Al-Hames, M
Lang, M
Rigoll, G
2005 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), VOLS 1 AND 2, 2005, : 865 - 868
[17] Compensating for speaker or lexical variabilities in speech for emotion recognition
Mariooryad, Soroosh
Busso, Carlos
SPEECH COMMUNICATION, 2014, 57 : 1 - 12
[18] AUTOMATED SPEECH RECOGNITION SYSTEM FOR SPEAKER EMOTION CLASSIFICATION
Anithadevi, N.
Gokul, P.
Nandan, S. Muhil
Magesh, R.
Shiddharth, S.
PROCEEDINGS OF THE 2020 5TH INTERNATIONAL CONFERENCE ON COMPUTING, COMMUNICATION AND SECURITY (ICCCS-2020), 2020,
[19] SPEAKER VARIABILITY IN EMOTION RECOGNITION - AN ADAPTATION BASED APPROACH
Ding, Ni
Sethu, Vidhyasaharan
Epps, Julien
Ambikairajah, Eliathamby
2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, : 5101 - 5104
[20] Unsupervised Adversarial Domain Adaptation for Cross-Lingual Speech Emotion Recognition
Latif, Siddique
Qadir, Junaid
Bilal, Muhammad
2019 8TH INTERNATIONAL CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION (ACII), 2019,

← 1 2 3 4 5 →