Cascaded Adversarial Learning for Speaker Independent Emotion Recognition

被引:0
|
作者
Lekamalage, Chamara Kasun Liyanaarachchi [1 ]
Lin, Zhiping [1 ]
Huang, Guang-Bin [2 ]
Rajapakse, Jagath Chandana [3 ]
机构
[1] Nanyang Technol Univ, Sch Elect & Elect Engn, Singapore, Singapore
[2] Mindpointeye, 18-06-07-08 Vis Exchange,2 Venture Dr, Singapore, Singapore
[3] Nanyang Technol Univ, Sch Comp Sci & Engn, Singapore, Singapore
来源
2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN) | 2022年
基金
新加坡国家研究基金会;
关键词
cascade networks; autoencoder; speaker independent emotion recognition (SIER); adversarial learning (AL); deep learning;
D O I
10.1109/IJCNN55064.2022.9892223
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In contrast to traditional adversarial learning (AL) which learns speaker-invariant representations, this paper proposes cascaded adversarial learning (CAL) which learns speaker-invariant emotion data for speaker independent emotion recognition (SIER) tasks. CAL is a dual cascaded network architecture where the output of the transformation network is fed as input to the classification network. Transformation network transforms original speech emotion to speaker-invariant emotion data by implementing an AL strategy with an encoder-decoder architecture. The classification network predicts the emotion from the speaker-invariant emotion data (output of the transformation network). We argue that the speaker-invariant emotion data realized by transformation network has less variation than the original speech emotion data and therefore are conducive for SIER as it improve generalization capability. To our knowledge this is the first time a dual cascaded network has been used for SIER and demonstrate state-of-the-art performances for SIER on Emo-DB and RAVDESS datasets.
引用
收藏
页数:8
相关论文
共 50 条
  • [1] Discriminative Adversarial Learning for Speaker Independent Emotion Recognition
    Kasun, Chamara
    Ahn, Chung Soo
    Rajapakse, Jagath C.
    Lin, Zhiping
    Huang, Guang-Bin
    INTERSPEECH 2022, 2022, : 4975 - 4979
  • [2] Speaker Adversarial Neural Network (SANN) for Speaker-independent Speech Emotion Recognition
    Md Shah Fahad
    Ashish Ranjan
    Akshay Deepak
    Gayadhar Pradhan
    Circuits, Systems, and Signal Processing, 2022, 41 : 6113 - 6135
  • [3] Speaker Adversarial Neural Network (SANN) for Speaker-independent Speech Emotion Recognition
    Fahad, Md Shah
    Ranjan, Ashish
    Deepak, Akshay
    Pradhan, Gayadhar
    CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2022, 41 (11) : 6113 - 6135
  • [4] Graph Learning Based Speaker Independent Speech Emotion Recognition
    Xu, Xinzhou
    Huang, Chengwei
    Wu, Chen
    Wang, Qingyun
    Zhao, Li
    ADVANCES IN ELECTRICAL AND COMPUTER ENGINEERING, 2014, 14 (02) : 17 - 22
  • [5] Contrastive Adversarial Learning for Person Independent Facial Emotion Recognition
    Kim, Daeha
    Song, Byung Cheol
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 5948 - 5956
  • [6] COMPARISON OF SPEAKER DEPENDENT AND SPEAKER INDEPENDENT EMOTION RECOGNITION
    Rybka, Jan
    Janicki, Artur
    INTERNATIONAL JOURNAL OF APPLIED MATHEMATICS AND COMPUTER SCIENCE, 2013, 23 (04) : 797 - 808
  • [7] BAYESIAN ADVERSARIAL LEARNING FOR SPEAKER RECOGNITION
    Chien, Jen-Tzung
    Kuo, Chun-Lin
    2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 381 - 388
  • [8] ADVERSARIAL MANIFOLD LEARNING FOR SPEAKER RECOGNITION
    Chien, Jen-Tzung
    Peng, Kang-Ting
    2017 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2017, : 599 - 605
  • [9] Neural adversarial learning for speaker recognition
    Chien, Jen-Tzung
    Peng, Kang-Ting
    COMPUTER SPEECH AND LANGUAGE, 2019, 58 : 422 - 440
  • [10] Domain Invariant Feature Learning for Speaker-Independent Speech Emotion Recognition
    Lu, Cheng
    Zong, Yuan
    Zheng, Wenming
    Li, Yang
    Tang, Chuangao
    Schuller, Bjoern W.
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2022, 30 : 2217 - 2230