ISNet: Individual Standardization Network for Speech Emotion Recognition

被引:23
|
作者
Fan, Weiquan [1 ]
Xu, Xiangmin [1 ]
Cai, Bolun [1 ]
Xing, Xiaofen [1 ]
机构
[1] South China Univ Technol, Sch Elect & Informat, Guangzhou 510640, Peoples R China
基金
中国国家自然科学基金;
关键词
Speech recognition; Emotion recognition; Feature extraction; Benchmark testing; Standardization; Speech processing; Task analysis; Individual standardization network (ISNet); speech emotion recognition; individual differences; metric; dataset; CLASSIFICATION; ATTENTION; FEATURES; VOICE;
D O I
10.1109/TASLP.2022.3171965
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Speech emotion recognition plays an essential role in human-computer interaction. However, cross-individual representation learning and individual-agnostic systems are challenging due to the distribution deviation caused by individual differences. The existing related approaches mostly use the auxiliary task of speaker recognition to eliminate individual differences. Unfortunately, although these methods can reduce interindividual voiceprint differences, it is difficult to dissociate interindividual expression differences since each individual has its unique expression habits. In this paper, we propose an individual standardization network (ISNet) for speech emotion recognition to alleviate the problem of interindividual emotion confusion caused by individual differences. Specifically, we model individual benchmarks as representations of nonemotional neutral speech, and ISNet realizes individual standardization using the automatically generated benchmark, which improves the robustness of individual-agnostic emotion representations. In response to individual differences, we also propose more comprehensive and meaningful individual-level evaluation metrics. In addition, we continue our previous work to construct a challenging large-scale speech emotion dataset (LSSED). We propose a more reasonable division method of the training set and testing set to prevent individual information leakage. Experimental results on datasets of both large and small scales have proven the effectiveness of ISNet, and the new state-of-the-art performance is achieved under the same experimental conditions on IEMOCAP and LSSED.
引用
收藏
页码:1803 / 1814
页数:12
相关论文
共 50 条
  • [31] Speech emotion recognition using the novel PEmoNet (Parallel Emotion Network)
    Bhangale, Kishor B.
    Kothandaraman, Mohanaprasad
    APPLIED ACOUSTICS, 2023, 212
  • [32] Speech Emotion Recognition Using Generative Adversarial Network and Deep Convolutional Neural Network
    Bhangale, Kishor
    Kothandaraman, Mohanaprasad
    CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2024, 43 (04) : 2341 - 2384
  • [33] MaxMViT-MLP: Multiaxis and Multiscale Vision Transformers Fusion Network for Speech Emotion Recognition
    Ong, Kah Liang
    Lee, Chin Poo
    Lim, Heng Siong
    Lim, Kian Ming
    Alqahtani, Ali
    IEEE ACCESS, 2024, 12 : 18237 - 18250
  • [34] Application of probabilistic neural network for speech emotion recognition
    Deshmukh S.
    Gupta P.
    International Journal of Speech Technology, 2024, 27 (01) : 19 - 28
  • [35] Speech Emotion Recognition via Sparse Learning-Based Fusion Model
    Min, Dong-Jin
    Kim, Deok-Hwan
    IEEE ACCESS, 2024, 12 : 177219 - 177235
  • [36] Learning With Rater-Expanded Label Space to Improve Speech Emotion Recognition
    Upadhyay, Shreya G.
    Chien, Woan-Shiuan
    Su, Bo-Hao
    Lee, Chi-Chun
    IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2024, 15 (03) : 1539 - 1552
  • [37] Multi-View Speech Emotion Recognition Via Collective Relation Construction
    Hou, Mixiao
    Zhang, Zheng
    Cao, Qi
    Zhang, David
    Lu, Guangming
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2022, 30 : 218 - 229
  • [38] Speech Emotion Recognition by Late Fusion for Bidirectional Reservoir Computing With Random Projection
    Ibrahim, Hemin
    Loo, Chu Kiong
    Alnajjar, Fady
    IEEE ACCESS, 2021, 9 : 122855 - 122871
  • [39] A Pattern Mining Approach for Improving Speech Emotion Recognition
    Avci, Umut
    INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2022, 36 (14)
  • [40] Analysis of Oral Exams With Speaker Diarization and Speech Emotion Recognition: A Case Study
    Beccaro, Wesley
    Ramirez, Miguel Arjona
    Liaw, William
    Guimaraes, Heitor Rodrigues
    IEEE TRANSACTIONS ON EDUCATION, 2024, 67 (01) : 74 - 86