Curriculum Learning for Speech Emotion Recognition From Crowdsourced Labels

被引:56
|
作者
Lotfian, Reza [1 ]
Busso, Carlos [1 ]
机构
[1] Univ Texas Dallas, Erik Jonsson Sch Elect & Comp Engn, Richardson, TX 75080 USA
基金
美国国家科学基金会;
关键词
Curriculum learning; speech emotion recognition; inter-evaluator agreement; CORPUS; CLASSIFICATION; INTELLIGENCE;
D O I
10.1109/TASLP.2019.2898816
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This study introduces a method to design a curriculum for machine-learning to maximize the efficiency during the training process of deep neural networks (DNNs) for speech emotion recognition. Previous studies in other machine-learning problems have shown the benefits of training a classifier following a curriculum where samples are gradually presented in increasing level of difficulty. For speech emotion recognition, the challenge is to establish a natural order of difficulty in the training set to create the curriculum. We address this problem by assuming that, ambiguous samples for humans are also ambiguous for computers. Speech samples are often annotated by multiple evaluators to account for differences in emotion perception across individuals. While some sentences with clear emotional content are consistently annotated, sentences with more ambiguous emotional content present important disagreement between individual evaluations. We propose to use the disagreement between evaluators as a measure of difficulty for the classification task. We propose metrics that quantify the inter-evaluation agreement to define the curriculum for regression problems and binary and multi-class classification problems. The experimental results consistently show that relying on a curriculum based on agreement between human judgments leads to statistically significant improvements over baselines trained without a curriculum.
引用
收藏
页码:815 / 826
页数:12
相关论文
共 50 条
  • [1] Meta-Learning for Speech Emotion Recognition Considering Ambiguity of Emotion Labels
    Fujioka, Takuya
    Homma, Takeshi
    Nagamatsu, Kenji
    INTERSPEECH 2020, 2020, : 2332 - 2336
  • [2] Learning Attributes from the Crowdsourced Relative Labels
    Tian, Tian
    Chen, Ning
    Zhu, Jun
    THIRTY-FIRST AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 1562 - 1568
  • [3] Emotion Recognition with Refined Labels for Deep Learning
    Zhang, Su
    Guan, Cuntai
    42ND ANNUAL INTERNATIONAL CONFERENCES OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY: ENABLING INNOVATIVE TECHNOLOGIES FOR GLOBAL HEALTHCARE EMBC'20, 2020, : 108 - 111
  • [4] Learning Alignment for Multimodal Emotion Recognition from Speech
    Xu, Haiyang
    Zhang, Hui
    Han, Kun
    Wang, Yun
    Peng, Yiping
    Li, Xiangang
    INTERSPEECH 2019, 2019, : 3569 - 3573
  • [5] Emotion Recognition from Speech: An Unsupervised Learning Approach
    Rovetta, Stefano
    Mnasri, Zied
    Masulli, Francesco
    Cabri, Alberto
    INTERNATIONAL JOURNAL OF COMPUTATIONAL INTELLIGENCE SYSTEMS, 2021, 14 (01) : 23 - 35
  • [6] Representation Learning for Speech Emotion Recognition
    Ghosh, Sayan
    Laksana, Eugene
    Morency, Louis-Philippe
    Scherer, Stefan
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 3603 - 3607
  • [7] Speech Emotion Recognition with Deep Learning
    Harar, Pavol
    Burget, Radim
    Dutta, Malay Kishore
    2017 4TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND INTEGRATED NETWORKS (SPIN), 2017, : 137 - 140
  • [8] Transfer Learning for Speech Emotion Recognition
    Han Zhijie
    Zhao, Huijuan
    Wang, Ruchuan
    2019 IEEE 5TH INTL CONFERENCE ON BIG DATA SECURITY ON CLOUD (BIGDATASECURITY) / IEEE INTL CONFERENCE ON HIGH PERFORMANCE AND SMART COMPUTING (HPSC) / IEEE INTL CONFERENCE ON INTELLIGENT DATA AND SECURITY (IDS), 2019, : 96 - 99
  • [9] AESR: Speech Recognition With Speech Emotion Recogniting Learning
    Han, RongQi
    Liu, Xin
    Zhang, Hui
    MAN-MACHINE SPEECH COMMUNICATION, NCMMSC 2024, 2025, 2312 : 91 - 101
  • [10] Speech Emotion Recognition Based on Joint Self-Assessment Manikins and Emotion Labels
    Chen, Jing-Ming
    Chang, Pao-Chi
    Liang, Kai-Wen
    2019 IEEE INTERNATIONAL SYMPOSIUM ON MULTIMEDIA (ISM 2019), 2019, : 327 - 330