An Adversarial Training Based Speech Emotion Classifier With Isolated Gaussian Regularization

被引:3
作者
Fu, Changzeng [1 ,2 ]
Liu, Chaoran [2 ]
Ishi, Carlos Toshinori [3 ]
Ishiguro, Hiroshi [1 ]
机构
[1] Osaka Univ, Grad Sch Engn Sci, Osaka 5608531, Japan
[2] RIKEN, Robot Project, Interact Robot Res Team, Kyoto 6190237, Japan
[3] RIKEN, Robot Project, Interact Robot Res Team, Kyoto 3510198, Japan
关键词
Speech emotion recognition; Adversarial training; regularization; MODEL;
D O I
10.1109/TAFFC.2022.3169091
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Speaker individual bias may cause emotion-related features to form clusters with irregular borders (non-Gaussian distributions), making the model sensitive to local irregularities of pattern distributions, resulting in the model over-fit of the in-domain dataset. This problem may cause a decrease in the validation scores in cross-domain (i.e., speaker-independent, channel-variant) implementation. To mitigate this problem, in this paper, we propose an adversarial training-based classifier to regularize the distribution of latent representations to further smooth the boundaries among different categories. In the regularization phase, the representations are mapped into Gaussian distributions in an unsupervised manner to improve the discriminative ability of the latent representations. A single Gaussian distribution is used for mapping the latent representations in our previous study. In this presented work, we adopt a mixture of isolated Gaussian distributions. Moreover, multi-instance learning was adopted by dividing speech into a bag of segments to capture the most salient part of presenting an emotion. The model was evaluated on the IEMOCAP and MELD datasets with in-corpus speaker-independent sittings. In addition, we investigated the accuracy of cross-corpus sittings in simulating speaker-independent and channel-variants. In the experiment, the proposed model was compared not only with baseline models but also with different configurations of our model. The results show that the proposed model is competitive with respect to the baseline, as demonstrated both by in-corpus and cross-corpus validation.
引用
收藏
页码:2361 / 2374
页数:14
相关论文
共 50 条
  • [31] A Novel Adversarial Training Scheme for Deep Neural Network based Speech Enhancement
    Cornell, Samuele
    Principi, Emanuele
    Squartini, Stefano
    2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
  • [32] DOMAIN-ADVERSARIAL AUTOENCODER WITH ATTENTION BASED FEATURE LEVEL FUSION FOR SPEECH EMOTION RECOGNITION
    Gao, Yuan
    Liu, JiaXing
    Wang, Longbiao
    Dang, Jianwu
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6314 - 6318
  • [33] Speech emotion recognition using data augmentation method by cycle-generative adversarial networks
    Shilandari, Arash
    Marvi, Hossein
    Khosravi, Hossein
    Wang, Wenwu
    SIGNAL IMAGE AND VIDEO PROCESSING, 2022, 16 (07) : 1955 - 1962
  • [34] Multi-Classifier Interactive Learning for Ambiguous Speech Emotion Recognition
    Zhou, Ying
    Liang, Xuefeng
    Gu, Yu
    Yin, Yifei
    Yao, Longshan
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2022, 30 : 695 - 705
  • [35] Speech Emotion Recognition Using Multi-Layer Perceptron Classifier
    Yuan, Xiaochen
    Wong, Wai Pang
    Lam, Chan Tong
    2022 IEEE 10TH INTERNATIONAL CONFERENCE ON INFORMATION, COMMUNICATION AND NETWORKS (ICICN 2022), 2022, : 644 - 648
  • [36] Speech emotion recognition based on emotion perception
    Liu, Gang
    Cai, Shifang
    Wang, Ce
    EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2023, 2023 (01)
  • [37] Speech emotion recognition based on emotion perception
    Gang Liu
    Shifang Cai
    Ce Wang
    EURASIP Journal on Audio, Speech, and Music Processing, 2023
  • [38] A Discriminative Feature Representation Method Based on Cascaded Attention Network With Adversarial Strategy for Speech Emotion Recognition
    Liu, Yang
    Sun, Haoqin
    Guan, Wenbo
    Xia, Yuqi
    Li, Yongwei
    Unoki, Masashi
    Zhao, Zhen
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2023, 31 : 1063 - 1074
  • [39] Discriminative Feature Representation Based on Cascaded Attention Network with Adversarial Joint Loss for Speech Emotion Recognition
    Liu, Yang
    Sun, Haoqin
    Guan, Wenbo
    Xia, Yuqi
    Zhao, Zhen
    INTERSPEECH 2022, 2022, : 4750 - 4754
  • [40] Exploration of an Independent Training Framework for Speech Emotion Recognition
    Zhong, Shunming
    Yu, Baoxian
    Zhang, Han
    IEEE ACCESS, 2020, 8 : 222533 - 222543