ISNet: Individual Standardization Network for Speech Emotion Recognition

被引：23

作者：

Fan, Weiquan ^{[1
]}

Xu, Xiangmin ^{[1
]}

Cai, Bolun ^{[1
]}

Xing, Xiaofen ^{[1
]}

机构：

[1] South China Univ Technol, Sch Elect & Informat, Guangzhou 510640, Peoples R China

来源：

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2022年 / 30卷

基金：

中国国家自然科学基金;

关键词：

Speech recognition; Emotion recognition; Feature extraction; Benchmark testing; Standardization; Speech processing; Task analysis; Individual standardization network (ISNet); speech emotion recognition; individual differences; metric; dataset; CLASSIFICATION; ATTENTION; FEATURES; VOICE;

D O I：

10.1109/TASLP.2022.3171965

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Speech emotion recognition plays an essential role in human-computer interaction. However, cross-individual representation learning and individual-agnostic systems are challenging due to the distribution deviation caused by individual differences. The existing related approaches mostly use the auxiliary task of speaker recognition to eliminate individual differences. Unfortunately, although these methods can reduce interindividual voiceprint differences, it is difficult to dissociate interindividual expression differences since each individual has its unique expression habits. In this paper, we propose an individual standardization network (ISNet) for speech emotion recognition to alleviate the problem of interindividual emotion confusion caused by individual differences. Specifically, we model individual benchmarks as representations of nonemotional neutral speech, and ISNet realizes individual standardization using the automatically generated benchmark, which improves the robustness of individual-agnostic emotion representations. In response to individual differences, we also propose more comprehensive and meaningful individual-level evaluation metrics. In addition, we continue our previous work to construct a challenging large-scale speech emotion dataset (LSSED). We propose a more reasonable division method of the training set and testing set to prevent individual information leakage. Experimental results on datasets of both large and small scales have proven the effectiveness of ISNet, and the new state-of-the-art performance is achieved under the same experimental conditions on IEMOCAP and LSSED.

引用

页码：1803 / 1814

页数：12

共 50 条

[21] Speech Emotion Recognition and Deep Learning: An Extensive Validation Using Convolutional Neural Networks
Ri, Francesco Ardan Dal
Ciardi, Fabio Cifariello
Conci, Nicola
IEEE ACCESS, 2023, 11 : 116638 - 116649
[22] Multi-Task Semi-Supervised Adversarial Autoencoding for Speech Emotion Recognition
Latif, Siddique
Rana, Rajib
Khalifa, Sara
Jurdak, Raja
Epps, Julien
Schuller, Bjoern W.
IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2022, 13 (02) : 992 - 1004
[23] Investigation of the Effect of Increased Dimension Levels in Speech Emotion Recognition
Wang, Haiyan
Zhao, Xiaohui
Zhao, Yanping
IEEE ACCESS, 2022, 10 : 78123 - 78134
[24] Speech Emotion Recognition Considering Nonverbal Vocalization in Affective Conversations
Hsu, Jia-Hao
Su, Ming-Hsiang
Wu, Chung-Hsien
Chen, Yi-Hsuan
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 (29) : 1675 - 1686
[25] Semi-Supervised Speech Emotion Recognition With Ladder Networks
Parthasarathy, Srinivas
Busso, Carlos
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2020, 28 : 2697 - 2709
[26] Deep scattering network for speech emotion recognition
Singh, Premjeet
Saha, Goutam
Sahidullah, Md
29TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2021), 2021, : 131 - 135
[27] Speech Emotion Recognition with Hybrid Neural Network
Wei, Chuanzheng
Sun, Xiao
Tian, Fang
Ren, Fuji
5TH INTERNATIONAL CONFERENCE ON BIG DATA COMPUTING AND COMMUNICATIONS (BIGCOM 2019), 2019, : 298 - 302
[28] 3D Convolutional Neural Network for Speech Emotion Recognition With Its Realization on Intel CPU and NVIDIA GPU
Falahzadeh, Mohammad Reza
Farsa, Edris Zaman
Harimi, Ali
Ahmadi, Arash
Abraham, Ajith
IEEE ACCESS, 2022, 10 : 112460 - 112471
[29] Speech Emotion Recognition Based on Attention MCNN Combined With Gender Information
Hu, Zhangfang
LingHu, Kehuan
Yu, Hongling
Liao, Chenzhuo
IEEE ACCESS, 2023, 11 : 50285 - 50294
[30] Persian Speech Emotion Recognition
Savargiv, Mohammad
Bastanfard, Azam
2015 7TH CONFERENCE ON INFORMATION AND KNOWLEDGE TECHNOLOGY (IKT), 2015,

← 1 2 3 4 5 →