SPEAKER NORMALIZATION FOR SELF-SUPERVISED SPEECH EMOTION RECOGNITION

被引：39

作者：

Gat, Itai ^{[1
]}

Aronowitz, Hagai ^{[1
]}

Zhu, Weizhong ^{[1
]}

Morais, Edmilson ^{[1
]}

Hoory, Ron ^{[1
]}

机构：

[1] IBM Res AI, Albany, NY 12203 USA

来源：

2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2022年

关键词：

Speech emotion recognition; speaker normalization; self-supervised learning;

D O I：

10.1109/ICASSP43922.2022.9747460

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Large speech emotion recognition datasets are hard to obtain, and small datasets may contain biases. Deep-net-based classifiers, in turn, are prone to exploit those biases and find shortcuts such as speaker characteristics. These shortcuts usually harm a model's ability to generalize. To address this challenge, we propose a gradient-based adversary learning framework that learns a speech emotion recognition task while normalizing speaker characteristics from the feature representation. We demonstrate the efficacy of our method on both speaker-independent and speaker-dependent settings and obtain new state-of-the-art results on the challenging IEMOCAP dataset.

引用

页码：7342 / 7346

页数：5

共 31 条

[1]

[Anonymous], 2016, ICASSP

[2]

[Anonymous], IEEE SIGNAL PROCESSI

[3]

Baevski A., 2020, Advances in neural information processing systems

[4] IEMOCAP: interactive emotional dyadic motion capture database [J].

Busso, Carlos ;

Bulut, Murtaza ;

Lee, Chi-Chun ;

Kazemzadeh, Abe ;

Mower, Emily ;

Kim, Samuel ;

Chang, Jeannette N. ;

Lee, Sungbok ;

Narayanan, Shrikanth S. .

LANGUAGE RESOURCES AND EVALUATION, 2008, 42 (04) :335-359

[5] Vector-Quantized Autoregressive Predictive Coding [J].

Chung, Yu-An ;

Tang, Hao ;

Glass, James .

INTERSPEECH 2020, 2020, :3760-3764

[6] An Unsupervised Autoregressive Model for Speech Representation Learning [J].

Chung, Yu-An ;

Hsu, Wei-Ning ;

Tang, Hao ;

Glass, James .

INTERSPEECH 2019, 2019, :146-150

[7]

GANIN Y, 2015, ICML, DOI DOI 10.48550/ARXIV.1409.7495

[8]

Hsu W.-N., 2021, ARXIV210607447

[9]

Jalal Asif, 2020, INTERSPEECH

[10]

Keren Gil, 2016, IJCNN, P2

← 1 2 3 4 →