ISNet: Individual Standardization Network for Speech Emotion Recognition

被引:23
|
作者
Fan, Weiquan [1 ]
Xu, Xiangmin [1 ]
Cai, Bolun [1 ]
Xing, Xiaofen [1 ]
机构
[1] South China Univ Technol, Sch Elect & Informat, Guangzhou 510640, Peoples R China
基金
中国国家自然科学基金;
关键词
Speech recognition; Emotion recognition; Feature extraction; Benchmark testing; Standardization; Speech processing; Task analysis; Individual standardization network (ISNet); speech emotion recognition; individual differences; metric; dataset; CLASSIFICATION; ATTENTION; FEATURES; VOICE;
D O I
10.1109/TASLP.2022.3171965
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Speech emotion recognition plays an essential role in human-computer interaction. However, cross-individual representation learning and individual-agnostic systems are challenging due to the distribution deviation caused by individual differences. The existing related approaches mostly use the auxiliary task of speaker recognition to eliminate individual differences. Unfortunately, although these methods can reduce interindividual voiceprint differences, it is difficult to dissociate interindividual expression differences since each individual has its unique expression habits. In this paper, we propose an individual standardization network (ISNet) for speech emotion recognition to alleviate the problem of interindividual emotion confusion caused by individual differences. Specifically, we model individual benchmarks as representations of nonemotional neutral speech, and ISNet realizes individual standardization using the automatically generated benchmark, which improves the robustness of individual-agnostic emotion representations. In response to individual differences, we also propose more comprehensive and meaningful individual-level evaluation metrics. In addition, we continue our previous work to construct a challenging large-scale speech emotion dataset (LSSED). We propose a more reasonable division method of the training set and testing set to prevent individual information leakage. Experimental results on datasets of both large and small scales have proven the effectiveness of ISNet, and the new state-of-the-art performance is achieved under the same experimental conditions on IEMOCAP and LSSED.
引用
收藏
页码:1803 / 1814
页数:12
相关论文
共 50 条
  • [1] A Comprehensive Review of Speech Emotion Recognition Systems
    Wani, Taiba Majid
    Gunawan, Teddy Surya
    Qadri, Syed Asif Ahmad
    Kartiwi, Mira
    Ambikairajah, Eliathamby
    IEEE ACCESS, 2021, 9 : 47795 - 47814
  • [2] Temporal Relation Inference Network for Multimodal Speech Emotion Recognition
    Dong, Guan-Nan
    Pun, Chi-Man
    Zhang, Zheng
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (09) : 6472 - 6485
  • [3] Speech Emotion Recognition Based on Robust Discriminative Sparse Regression
    Song, Peng
    Zheng, Wenming
    Yu, Yanwei
    Ou, Shifeng
    IEEE TRANSACTIONS ON COGNITIVE AND DEVELOPMENTAL SYSTEMS, 2021, 13 (02) : 343 - 353
  • [4] Multi-Classifier Interactive Learning for Ambiguous Speech Emotion Recognition
    Zhou, Ying
    Liang, Xuefeng
    Gu, Yu
    Yin, Yifei
    Yao, Longshan
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2022, 30 : 695 - 705
  • [5] Design of Efficient Speech Emotion Recognition Based on Multi Task Learning
    Liu, Yunxiang
    Zhang, Kexin
    IEEE ACCESS, 2023, 11 : 5528 - 5537
  • [6] Exploration of an Independent Training Framework for Speech Emotion Recognition
    Zhong, Shunming
    Yu, Baoxian
    Zhang, Han
    IEEE ACCESS, 2020, 8 : 222533 - 222543
  • [7] EmoNet: A Transfer Learning Framework for Multi-Corpus Speech Emotion Recognition
    Gerczuk, Maurice
    Amiriparian, Shahin
    Ottl, Sandra
    Schuller, Bjorn W. W.
    IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2023, 14 (02) : 1472 - 1487
  • [8] Improving Speech Emotion Recognition With Adversarial Data Augmentation Network
    Yi, Lu
    Mak, Man-Wai
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2022, 33 (01) : 172 - 184
  • [9] Improving Pre-Trained Model-Based Speech Emotion Recognition From a Low-Level Speech Feature Perspective
    Liu, Ke
    Wei, Jiwei
    Zou, Jie
    Wang, Peng
    Yang, Yang
    Shen, Heng Tao
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 10623 - 10636
  • [10] Autoencoder With Emotion Embedding for Speech Emotion Recognition
    Zhang, Chenghao
    Xue, Lei
    IEEE ACCESS, 2021, 9 : 51231 - 51241