SPEAKER NORMALIZATION FOR SELF-SUPERVISED SPEECH EMOTION RECOGNITION

被引:24
作者
Gat, Itai [1 ]
Aronowitz, Hagai [1 ]
Zhu, Weizhong [1 ]
Morais, Edmilson [1 ]
Hoory, Ron [1 ]
机构
[1] IBM Res AI, Albany, NY 12203 USA
来源
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2022年
关键词
Speech emotion recognition; speaker normalization; self-supervised learning;
D O I
10.1109/ICASSP43922.2022.9747460
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Large speech emotion recognition datasets are hard to obtain, and small datasets may contain biases. Deep-net-based classifiers, in turn, are prone to exploit those biases and find shortcuts such as speaker characteristics. These shortcuts usually harm a model's ability to generalize. To address this challenge, we propose a gradient-based adversary learning framework that learns a speech emotion recognition task while normalizing speaker characteristics from the feature representation. We demonstrate the efficacy of our method on both speaker-independent and speaker-dependent settings and obtain new state-of-the-art results on the challenging IEMOCAP dataset.
引用
收藏
页码:7342 / 7346
页数:5
相关论文
共 31 条
  • [1] [Anonymous], IEEE SIGNAL PROCESSI
  • [2] [Anonymous], 2016, ICASSP
  • [3] Baevski Alexei, 2020, Advances in neural information processing systems
  • [4] IEMOCAP: interactive emotional dyadic motion capture database
    Busso, Carlos
    Bulut, Murtaza
    Lee, Chi-Chun
    Kazemzadeh, Abe
    Mower, Emily
    Kim, Samuel
    Chang, Jeannette N.
    Lee, Sungbok
    Narayanan, Shrikanth S.
    [J]. LANGUAGE RESOURCES AND EVALUATION, 2008, 42 (04) : 335 - 359
  • [5] Vector-Quantized Autoregressive Predictive Coding
    Chung, Yu-An
    Tang, Hao
    Glass, James
    [J]. INTERSPEECH 2020, 2020, : 3760 - 3764
  • [6] An Unsupervised Autoregressive Model for Speech Representation Learning
    Chung, Yu-An
    Hsu, Wei-Ning
    Tang, Hao
    Glass, James
    [J]. INTERSPEECH 2019, 2019, : 146 - 150
  • [7] Ganin V., 2015, PR MACH LEARN RES, P1180
  • [8] Hsu Wei-Ning, 2021, ARXIV210607447
  • [9] Jalal Asif, 2020, INTERSPEECH
  • [10] Keren Gil, 2016, IJCNN, P2