An Adversarial Training Based Speech Emotion Classifier With Isolated Gaussian Regularization

被引:3
作者
Fu, Changzeng [1 ,2 ]
Liu, Chaoran [2 ]
Ishi, Carlos Toshinori [3 ]
Ishiguro, Hiroshi [1 ]
机构
[1] Osaka Univ, Grad Sch Engn Sci, Osaka 5608531, Japan
[2] RIKEN, Robot Project, Interact Robot Res Team, Kyoto 6190237, Japan
[3] RIKEN, Robot Project, Interact Robot Res Team, Kyoto 3510198, Japan
关键词
Speech emotion recognition; Adversarial training; regularization; MODEL;
D O I
10.1109/TAFFC.2022.3169091
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Speaker individual bias may cause emotion-related features to form clusters with irregular borders (non-Gaussian distributions), making the model sensitive to local irregularities of pattern distributions, resulting in the model over-fit of the in-domain dataset. This problem may cause a decrease in the validation scores in cross-domain (i.e., speaker-independent, channel-variant) implementation. To mitigate this problem, in this paper, we propose an adversarial training-based classifier to regularize the distribution of latent representations to further smooth the boundaries among different categories. In the regularization phase, the representations are mapped into Gaussian distributions in an unsupervised manner to improve the discriminative ability of the latent representations. A single Gaussian distribution is used for mapping the latent representations in our previous study. In this presented work, we adopt a mixture of isolated Gaussian distributions. Moreover, multi-instance learning was adopted by dividing speech into a bag of segments to capture the most salient part of presenting an emotion. The model was evaluated on the IEMOCAP and MELD datasets with in-corpus speaker-independent sittings. In addition, we investigated the accuracy of cross-corpus sittings in simulating speaker-independent and channel-variants. In the experiment, the proposed model was compared not only with baseline models but also with different configurations of our model. The results show that the proposed model is competitive with respect to the baseline, as demonstrated both by in-corpus and cross-corpus validation.
引用
收藏
页码:2361 / 2374
页数:14
相关论文
共 50 条
  • [41] Exploration of an Independent Training Framework for Speech Emotion Recognition
    Zhong, Shunming
    Yu, Baoxian
    Zhang, Han
    IEEE ACCESS, 2020, 8 : 222533 - 222543
  • [42] Classifier Subset Selection for the Stacked Generalization Method Applied to Emotion Recognition in Speech
    Alvarez, Aitor
    Sierra, Basilio
    Arruti, Andoni
    Lopez-Gil, Juan-Miguel
    Garay-Vitoria, Nestor
    SENSORS, 2016, 16 (01)
  • [43] A Multi-Feature Multi-Classifier System for Speech Emotion Recognition
    Li, Pengcheng
    Song, Yan
    Wang, Peisen
    Dai, Lirong
    2018 FIRST ASIAN CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION (ACII ASIA), 2018,
  • [44] Self-training maximum classifier discrepancy for EEG emotion recognition
    Zhang, Xu
    Huang, Dengbing
    Li, Hanyu
    Zhang, Youjia
    Xia, Ying
    Liu, Jinzhuo
    CAAI TRANSACTIONS ON INTELLIGENCE TECHNOLOGY, 2023, 8 (04) : 1480 - 1491
  • [45] Speech emotion recognition using Gaussian mixture vector autoregressive models
    El Ayadi, Moataz M. H.
    Kamel, Mohamed S.
    Karray, Fakhri
    2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3, 2007, : 957 - +
  • [46] Accented Speech Recognition Based on End-to-End Domain Adversarial Training of Neural Networks
    Na, Hyeong-Ju
    Park, Jeong-Sik
    APPLIED SCIENCES-BASEL, 2021, 11 (18):
  • [47] Speech Emotion Recognition in the Wild using Multi-task and Adversarial Learning
    Parry, Jack
    DeMattos, Eric
    Klementiev, Anita
    Ind, Axel
    Morse-Kopp, Daniela
    Clarke, Georgia
    Palaz, Dimitri
    INTERSPEECH 2022, 2022, : 1158 - 1162
  • [48] Adversarial Domain Generalized Transformer for Cross-Corpus Speech Emotion Recognition
    Gao, Yuan
    Wang, Longbiao
    Liu, Jiaxing
    Dang, Jianwu
    Okada, Shogo
    IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2024, 15 (02) : 697 - 708
  • [49] Unsupervised Adversarial Domain Adaptation for Cross-Lingual Speech Emotion Recognition
    Latif, Siddique
    Qadir, Junaid
    Bilal, Muhammad
    2019 8TH INTERNATIONAL CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION (ACII), 2019,
  • [50] Virtual Adversarial Training: A Regularization Method for Supervised and Semi-Supervised Learning
    Miyato, Takeru
    Maeda, Shin-Ichi
    Koyama, Masanori
    Ishii, Shin
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2019, 41 (08) : 1979 - 1993