An Adversarial Training Based Speech Emotion Classifier With Isolated Gaussian Regularization

被引：3

作者：

Fu, Changzeng ^{[1
,2
]}

Liu, Chaoran ^{[2
]}

Ishi, Carlos Toshinori ^{[3
]}

Ishiguro, Hiroshi ^{[1
]}

机构：

[1] Osaka Univ, Grad Sch Engn Sci, Osaka 5608531, Japan

[2] RIKEN, Robot Project, Interact Robot Res Team, Kyoto 6190237, Japan

[3] RIKEN, Robot Project, Interact Robot Res Team, Kyoto 3510198, Japan

来源：

IEEE TRANSACTIONS ON AFFECTIVE COMPUTING | 2023年 / 14卷 / 03期

关键词：

Speech emotion recognition; Adversarial training; regularization; MODEL;

D O I：

10.1109/TAFFC.2022.3169091

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Speaker individual bias may cause emotion-related features to form clusters with irregular borders (non-Gaussian distributions), making the model sensitive to local irregularities of pattern distributions, resulting in the model over-fit of the in-domain dataset. This problem may cause a decrease in the validation scores in cross-domain (i.e., speaker-independent, channel-variant) implementation. To mitigate this problem, in this paper, we propose an adversarial training-based classifier to regularize the distribution of latent representations to further smooth the boundaries among different categories. In the regularization phase, the representations are mapped into Gaussian distributions in an unsupervised manner to improve the discriminative ability of the latent representations. A single Gaussian distribution is used for mapping the latent representations in our previous study. In this presented work, we adopt a mixture of isolated Gaussian distributions. Moreover, multi-instance learning was adopted by dividing speech into a bag of segments to capture the most salient part of presenting an emotion. The model was evaluated on the IEMOCAP and MELD datasets with in-corpus speaker-independent sittings. In addition, we investigated the accuracy of cross-corpus sittings in simulating speaker-independent and channel-variants. In the experiment, the proposed model was compared not only with baseline models but also with different configurations of our model. The results show that the proposed model is competitive with respect to the baseline, as demonstrated both by in-corpus and cross-corpus validation.

引用

页码：2361 / 2374

页数：14

共 50 条

[31] A Novel Adversarial Training Scheme for Deep Neural Network based Speech Enhancement
Cornell, Samuele
Principi, Emanuele
Squartini, Stefano
2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
[32] DOMAIN-ADVERSARIAL AUTOENCODER WITH ATTENTION BASED FEATURE LEVEL FUSION FOR SPEECH EMOTION RECOGNITION
Gao, Yuan
Liu, JiaXing
Wang, Longbiao
Dang, Jianwu
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6314 - 6318
[33] Speech emotion recognition using data augmentation method by cycle-generative adversarial networks
Shilandari, Arash
Marvi, Hossein
Khosravi, Hossein
Wang, Wenwu
SIGNAL IMAGE AND VIDEO PROCESSING, 2022, 16 (07) : 1955 - 1962
[34] Multi-Classifier Interactive Learning for Ambiguous Speech Emotion Recognition
Zhou, Ying
Liang, Xuefeng
Gu, Yu
Yin, Yifei
Yao, Longshan
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2022, 30 : 695 - 705
[35] Speech Emotion Recognition Using Multi-Layer Perceptron Classifier
Yuan, Xiaochen
Wong, Wai Pang
Lam, Chan Tong
2022 IEEE 10TH INTERNATIONAL CONFERENCE ON INFORMATION, COMMUNICATION AND NETWORKS (ICICN 2022), 2022, : 644 - 648
[36] Speech emotion recognition based on emotion perception
Liu, Gang
Cai, Shifang
Wang, Ce
EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2023, 2023 (01)
[37] Speech emotion recognition based on emotion perception
Gang Liu
Shifang Cai
Ce Wang
EURASIP Journal on Audio, Speech, and Music Processing, 2023
[38] A Discriminative Feature Representation Method Based on Cascaded Attention Network With Adversarial Strategy for Speech Emotion Recognition
Liu, Yang
Sun, Haoqin
Guan, Wenbo
Xia, Yuqi
Li, Yongwei
Unoki, Masashi
Zhao, Zhen
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2023, 31 : 1063 - 1074
[39] Discriminative Feature Representation Based on Cascaded Attention Network with Adversarial Joint Loss for Speech Emotion Recognition
Liu, Yang
Sun, Haoqin
Guan, Wenbo
Xia, Yuqi
Zhao, Zhen
INTERSPEECH 2022, 2022, : 4750 - 4754
[40] Exploration of an Independent Training Framework for Speech Emotion Recognition
Zhong, Shunming
Yu, Baoxian
Zhang, Han
IEEE ACCESS, 2020, 8 : 222533 - 222543

← 1 2 3 4 5 →