An Adversarial Training Based Speech Emotion Classifier With Isolated Gaussian Regularization

被引:2
|
作者
Fu, Changzeng [1 ,2 ]
Liu, Chaoran [2 ]
Ishi, Carlos Toshinori [3 ]
Ishiguro, Hiroshi [1 ]
机构
[1] Osaka Univ, Grad Sch Engn Sci, Osaka 5608531, Japan
[2] RIKEN, Robot Project, Interact Robot Res Team, Kyoto 6190237, Japan
[3] RIKEN, Robot Project, Interact Robot Res Team, Kyoto 3510198, Japan
关键词
Speech emotion recognition; Adversarial training; regularization; MODEL;
D O I
10.1109/TAFFC.2022.3169091
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Speaker individual bias may cause emotion-related features to form clusters with irregular borders (non-Gaussian distributions), making the model sensitive to local irregularities of pattern distributions, resulting in the model over-fit of the in-domain dataset. This problem may cause a decrease in the validation scores in cross-domain (i.e., speaker-independent, channel-variant) implementation. To mitigate this problem, in this paper, we propose an adversarial training-based classifier to regularize the distribution of latent representations to further smooth the boundaries among different categories. In the regularization phase, the representations are mapped into Gaussian distributions in an unsupervised manner to improve the discriminative ability of the latent representations. A single Gaussian distribution is used for mapping the latent representations in our previous study. In this presented work, we adopt a mixture of isolated Gaussian distributions. Moreover, multi-instance learning was adopted by dividing speech into a bag of segments to capture the most salient part of presenting an emotion. The model was evaluated on the IEMOCAP and MELD datasets with in-corpus speaker-independent sittings. In addition, we investigated the accuracy of cross-corpus sittings in simulating speaker-independent and channel-variants. In the experiment, the proposed model was compared not only with baseline models but also with different configurations of our model. The results show that the proposed model is competitive with respect to the baseline, as demonstrated both by in-corpus and cross-corpus validation.
引用
收藏
页码:2361 / 2374
页数:14
相关论文
共 50 条
  • [21] Glowworm swarm based fuzzy classifier with dual features for speech emotion recognition
    B. Rajasekhar
    M. Kamaraju
    V. Sumalatha
    Evolutionary Intelligence, 2022, 15 : 939 - 953
  • [22] A DCRNN-based ensemble classifier for speech emotion recognition in Odia language
    Monorama Swain
    Bubai Maji
    P. Kabisatpathy
    Aurobinda Routray
    Complex & Intelligent Systems, 2022, 8 : 4237 - 4249
  • [23] A DCRNN-based ensemble classifier for speech emotion recognition in Odia language
    Swain, Monorama
    Maji, Bubai
    Kabisatpathy, P.
    Routray, Aurobinda
    COMPLEX & INTELLIGENT SYSTEMS, 2022, 8 (05) : 4237 - 4249
  • [24] Glowworm swarm based fuzzy classifier with dual features for speech emotion recognition
    Rajasekhar, B.
    Kamaraju, M.
    Sumalatha, V
    EVOLUTIONARY INTELLIGENCE, 2022, 15 (02) : 939 - 953
  • [25] Domain Adversarial Training for Speech Enhancement
    Hou, Nana
    Xu, Chenglin
    Chng, Eng Siong
    Li, Haizhou
    2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2019, : 667 - 672
  • [26] Multi-Classifier Speech Emotion Recognition System
    Partila, Pavol
    Tovarek, Jaromir
    Voznak, Miroslav
    Rozhon, Jan
    Sevcik, Lukas
    Baran, Remigiusz
    2018 26TH TELECOMMUNICATIONS FORUM (TELFOR), 2018, : 416 - 419
  • [27] Optimal Classifier Selection in Turkish Speech Emotion Detection
    Ozsonmez, Damla Busra
    Acarman, Tankut
    Parlak, Ismail Burak
    29TH IEEE CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS (SIU 2021), 2021,
  • [28] Design of Hierarchical Classifier to Improve Speech Emotion Recognition
    Vasuki, P.
    COMPUTER SYSTEMS SCIENCE AND ENGINEERING, 2023, 44 (01): : 19 - 33
  • [29] Transforming the Emotion in Speech using a Generative Adversarial Network
    Yasuda, Kenji
    Orihara, Ryohei
    Sei, Yuichi
    Tahara, Yasuyuki
    Ohsuga, Akihiko
    PROCEEDINGS OF THE 11TH INTERNATIONAL CONFERENCE ON AGENTS AND ARTIFICIAL INTELLIGENCE (ICAART), VOL 2, 2019, : 427 - 434
  • [30] Adversarial Data Augmentation Network for Speech Emotion Recognition
    Yi, Lu
    Mak, Man-Wai
    2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2019, : 529 - 534