Deep Architecture Enhancing Robustness to Noise, Adversarial Attacks, and Cross-corpus Setting for Speech Emotion Recognition

被引:12
|
作者
Latif, Siddique [1 ,2 ]
Rana, Rajib [1 ]
Khalifa, Sara [2 ,3 ]
Jurdak, Raja [4 ]
Schuller, Bjoern W. [5 ,6 ]
机构
[1] Univ Southern Queensland, Toowoomba, Qld, Australia
[2] CSIRO, Data61, Distributed Sensing Syst Grp, Canberra, ACT, Australia
[3] Univ New South Wales, Sydney, NSW, Australia
[4] Queensland Univ Technol, Brisbane, Qld, Australia
[5] Imperial Coll London, GLAM Grp Language Audio & Mus, London, England
[6] Univ Augsburg, Chair Embedded Intelligence Hlth Care & Wellbeing, Augsburg, Germany
来源
INTERSPEECH 2020 | 2020年
关键词
speech emotion; mixup; data augmentation; convolutional neural networks; DenseNet; highway network; NETWORK;
D O I
10.21437/Interspeech.2020-3190
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
Speech emotion recognition systems (SER) can achieve high accuracy when the training and test data are identically distributed, but this assumption is frequently violated in practice and the performance of SER systems plummet against unforeseen data shifts. The design of robust models for accurate SER is challenging, which limits its use in practical applications. In this paper we propose a deeper neural network architecture wherein we fuse Dense Convolutional Network (DenseNet), Long short-term memory (LSTM) and Highway Network to learn powerful discriminative features which are robust to noise. We also propose data augmentation with our network architecture to further improve the robustness. We comprehensively evaluate the architecture coupled with data augmentation against (1) noise, (2) adversarial attacks and (3) cross-corpus settings. Our evaluations on the widely used IEMOCAP and MSP-IMPROV datasets show promising results when compared with existing studies and state-of-the-art models.
引用
收藏
页码:2327 / 2331
页数:5
相关论文
共 50 条
  • [1] Adversarial Domain Generalized Transformer for Cross-Corpus Speech Emotion Recognition
    Gao, Yuan
    Wang, Longbiao
    Liu, Jiaxing
    Dang, Jianwu
    Okada, Shogo
    IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2024, 15 (02) : 697 - 708
  • [2] A CROSS-CORPUS STUDY ON SPEECH EMOTION RECOGNITION
    Milner, Rosanna
    Jalal, Md Asif
    Ng, Raymond W. M.
    Hain, Thomas
    2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 304 - 311
  • [3] Deep Cross-Corpus Speech Emotion Recognition: Recent Advances and Perspectives
    Zhang, Shiqing
    Liu, Ruixin
    Tao, Xin
    Zhao, Xiaoming
    FRONTIERS IN NEUROROBOTICS, 2021, 15
  • [4] Analysis of Deep Learning Architectures for Cross-corpus Speech Emotion Recognition
    Parry, Jack
    Palaz, Dimitri
    Clarke, Georgia
    Lecomte, Pauline
    Mead, Rebecca
    Berger, Michael
    Hofer, Gregor
    INTERSPEECH 2019, 2019, : 1656 - 1660
  • [5] Deep Transductive Transfer Regression Network for Cross-Corpus Speech Emotion Recognition
    Zhao, Yan
    Wang, Jincen
    Ye, Ru
    Zong, Yuan
    Zheng, Wenming
    Zhao, Li
    INTERSPEECH 2022, 2022, : 371 - 375
  • [6] Multi-scale discrepancy adversarial network for cross-corpus speech emotion recognition
    Wanlu ZHENG
    Wenming ZHENG
    Yuan ZONG
    虚拟现实与智能硬件(中英文), 2021, 3 (01) : 65 - 75
  • [7] Improving Cross-Corpus Speech Emotion Recognition with Adversarial Discriminative Domain Generalization (ADDoG)
    Gideon, John
    McInnis, Melvin G.
    Provost, Emily Mower
    IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2021, 12 (04) : 1055 - 1068
  • [8] Self Supervised Adversarial Domain Adaptation for Cross-Corpus and Cross-Language Speech Emotion Recognition
    Latif, Siddique
    Rana, Rajib
    Khalifa, Sara
    Jurdak, Raja
    Schuller, Bjorn
    IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2023, 14 (03) : 1912 - 1926
  • [9] Convolutional Auto-Encoder and Adversarial Domain Adaptation for Cross-Corpus Speech Emotion Recognition
    Wang, Yang
    Fu, Hongliang
    Tao, Huawei
    Yang, Jing
    Ge, Hongyi
    Xie, Yue
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2022, E105D (10) : 1803 - 1806
  • [10] Improved Cross-Corpus Speech Emotion Recognition Using Deep Local Domain Adaptation
    ZHAO Huijuan
    YE Ning
    WANG Ruchuan
    ChineseJournalofElectronics, 2023, 32 (03) : 640 - 646