Improving Speech Emotion Recognition With Adversarial Data Augmentation Network

被引:69
|
作者
Yi, Lu [1 ]
Mak, Man-Wai [1 ]
机构
[1] Hong Kong Polytech Univ, Dept Elect & Informat Engn, Hong Kong, Peoples R China
关键词
Generators; Feature extraction; Training; Emotion recognition; Speech recognition; Generative adversarial networks; Gallium nitride; Data augmentation; generative adversarial networks (GANs); speech emotion recognition; Wasserstein divergence; NEURAL-NETWORKS; MODEL;
D O I
10.1109/TNNLS.2020.3027600
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
When training data are scarce, it is challenging to train a deep neural network without causing the overfitting problem. For overcoming this challenge, this article proposes a new data augmentation network-namely adversarial data augmentation network (ADAN)- based on generative adversarial networks (GANs). The ADAN consists of a GAN, an autoencoder, and an auxiliary classifier. These networks are trained adversarially to synthesize class-dependent feature vectors in both the latent space and the original feature space, which can be augmented to the real training data for training classifiers. Instead of using the conventional cross-entropy loss for adversarial training, the Wasserstein divergence is used in an attempt to produce high-quality synthetic samples. The proposed networks were applied to speech emotion recognition using EmoDB and IEMOCAP as the evaluation data sets. It was found that by forcing the synthetic latent vectors and the real latent vectors to share a common representation, the gradient vanishing problem can be largely alleviated. Also, results show that the augmented data generated by the proposed networks are rich in emotion information. Thus, the resulting emotion classifiers are competitive with state-of-the-art speech emotion recognition systems.
引用
收藏
页码:172 / 184
页数:13
相关论文
共 50 条
  • [1] Adversarial Data Augmentation Network for Speech Emotion Recognition
    Yi, Lu
    Mak, Man-Wai
    2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2019, : 529 - 534
  • [2] A Data Augmentation Approach for Improving the Performance of Speech Emotion Recognition
    Paraskevopoulou, Georgia
    Spyrou, Evaggelos
    Perantonis, Stavros
    SIGMAP: PROCEEDINGS OF THE 19TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND MULTIMEDIA APPLICATIONS, 2022, : 61 - 69
  • [3] Adversarial Data Augmentation for Disordered Speech Recognition
    Jin, Zengrui
    Geng, Mengzhe
    Xie, Xurong
    Yu, Jianwei
    Liu, Shansong
    Liu, Xunying
    Meng, Helen
    INTERSPEECH 2021, 2021, : 4803 - 4807
  • [4] Speech Emotion Recognition Using Data Augmentation
    Kapoor, Tanisha
    Ganguly, Arnaja
    Rajeswari, D.
    2024 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATION AND APPLIED INFORMATICS, ACCAI 2024, 2024,
  • [5] Speech emotion recognition using data augmentation
    V. M. Praseetha
    P. P. Joby
    International Journal of Speech Technology, 2022, 25 : 783 - 792
  • [6] Speech emotion recognition using data augmentation
    Praseetha, V. M.
    Joby, P. P.
    INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2021, 25 (4) : 783 - 792
  • [7] Speech emotion recognition using data augmentation method by cycle-generative adversarial networks
    Shilandari, Arash
    Marvi, Hossein
    Khosravi, Hossein
    Wang, Wenwu
    SIGNAL IMAGE AND VIDEO PROCESSING, 2022, 16 (07) : 1955 - 1962
  • [8] Speech emotion recognition using data augmentation method by cycle-generative adversarial networks
    Arash Shilandari
    Hossein Marvi
    Hossein Khosravi
    Wenwu Wang
    Signal, Image and Video Processing, 2022, 16 : 1955 - 1962
  • [9] Data Augmentation using GANs for Speech Emotion Recognition
    Chatziagapi, Aggelina
    Paraskevopoulos, Georgios
    Sgouropoulos, Dimitris
    Pantazopoulos, Georgios
    Nikandrou, Malvina
    Giannakopoulos, Theodoros
    Katsamanis, Athanasios
    Potamianos, Alexandros
    Narayanan, Shrikanth
    INTERSPEECH 2019, 2019, : 171 - 175
  • [10] Improving Multimodal Speech Recognition by Data Augmentation and Speech Representations
    Oneata, Dan
    Cucu, Horia
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2022, 2022, : 4578 - 4587