An Adversarial Training Based Speech Emotion Classifier With Isolated Gaussian Regularization

被引:2
|
作者
Fu, Changzeng [1 ,2 ]
Liu, Chaoran [2 ]
Ishi, Carlos Toshinori [3 ]
Ishiguro, Hiroshi [1 ]
机构
[1] Osaka Univ, Grad Sch Engn Sci, Osaka 5608531, Japan
[2] RIKEN, Robot Project, Interact Robot Res Team, Kyoto 6190237, Japan
[3] RIKEN, Robot Project, Interact Robot Res Team, Kyoto 3510198, Japan
关键词
Speech emotion recognition; Adversarial training; regularization; MODEL;
D O I
10.1109/TAFFC.2022.3169091
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Speaker individual bias may cause emotion-related features to form clusters with irregular borders (non-Gaussian distributions), making the model sensitive to local irregularities of pattern distributions, resulting in the model over-fit of the in-domain dataset. This problem may cause a decrease in the validation scores in cross-domain (i.e., speaker-independent, channel-variant) implementation. To mitigate this problem, in this paper, we propose an adversarial training-based classifier to regularize the distribution of latent representations to further smooth the boundaries among different categories. In the regularization phase, the representations are mapped into Gaussian distributions in an unsupervised manner to improve the discriminative ability of the latent representations. A single Gaussian distribution is used for mapping the latent representations in our previous study. In this presented work, we adopt a mixture of isolated Gaussian distributions. Moreover, multi-instance learning was adopted by dividing speech into a bag of segments to capture the most salient part of presenting an emotion. The model was evaluated on the IEMOCAP and MELD datasets with in-corpus speaker-independent sittings. In addition, we investigated the accuracy of cross-corpus sittings in simulating speaker-independent and channel-variants. In the experiment, the proposed model was compared not only with baseline models but also with different configurations of our model. The results show that the proposed model is competitive with respect to the baseline, as demonstrated both by in-corpus and cross-corpus validation.
引用
收藏
页码:2361 / 2374
页数:14
相关论文
共 50 条
  • [1] SMOOTHING MODEL PREDICTIONS USING ADVERSARIAL TRAINING PROCEDURES FOR SPEECH BASED EMOTION RECOGNITION
    Sahu, Saurabh
    Gupta, Rahul
    Sivaraman, Ganesh
    Espy-Wilson, Carol
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 4934 - 4938
  • [2] MAEC: MULTI-INSTANCE LEARNING WITH AN ADVERSARIAL AUTO-ENCODER-BASED CLASSIFIER FOR SPEECH EMOTION RECOGNITION
    Fu, Changzeng
    Liu, Chaoran
    Ishi, Carlos Toshinori
    Ishiguro, Hiroshi
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6299 - 6303
  • [3] Adversarial Training with Orthogonal Regularization
    Yuksel, Oguz Kaan
    Baytas, Inci Meliha
    2020 28TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2020,
  • [4] Adversarial Auto-encoders for Speech Based Emotion Recognition
    Sahu, Saurabh
    Gupta, Rahul
    Sivaraman, Ganesh
    AbdAlmageed, Wael
    Espy-Wilson, Carol
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 1243 - 1247
  • [5] Adversarial training regularization for negative sampling based network embedding
    Dai, Quanyu
    Shen, Xiao
    Zheng, Zimu
    Zhang, Liang
    Li, Qiang
    Wang, Dan
    INFORMATION SCIENCES, 2021, 579 : 199 - 217
  • [6] Wavelet regularization benefits adversarial training
    Yan, Jun
    Yin, Huilin
    Zhao, Ziming
    Ge, Wancheng
    Zhang, Hao
    Rigoll, Gerhard
    INFORMATION SCIENCES, 2023, 649
  • [7] ADVERSARIAL TRAINING WITH CHANNEL ATTENTION REGULARIZATION
    Cho, Seungju
    Byun, Junyoung
    Kwon, Myung-Joon
    Kim, Yoonji
    Kim, Changick
    2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2022, : 2996 - 3000
  • [8] Design of robust hyperspectral image classifier based on adversarial training against adversarial attack
    Park I.
    Kim S.
    Journal of Institute of Control, Robotics and Systems, 2021, 27 (06) : 389 - 400
  • [9] Model Smoothing using Virtual Adversarial Training for Speech Emotion Estimation using Spontaneity
    Kuwahara, Toyoaki
    Orihara, Ryohei
    Sei, Yuichi
    Tahara, Yasuyuki
    Ohsuga, Akihiko
    ICAART: PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON AGENTS AND ARTIFICIAL INTELLIGENCE, VOL 2, 2020, : 570 - 577
  • [10] The Research of Speech Emotion Recognition Based on Gaussian Mixture Model
    Zhang, Wanli
    Li, Guoxin
    Gao, Wei
    MECHANICAL COMPONENTS AND CONTROL ENGINEERING III, 2014, 668-669 : 1126 - +