Two-level discriminative speech emotion recognition model with wave field dynamics: A personalized speech emotion recognition method

被引：3

作者：

Jia, Ning ^{[1
]}

Zheng, Chunjun ^{[1
]}

机构：

[1] Dalian Neusoft Univ Informat, Sch Software, Dalian, Peoples R China

来源：

COMPUTER COMMUNICATIONS | 2021年 / 180卷

关键词：

Speech emotion recognition; Speaker classification; Wave field dynamics; Cross medium; Convolutional recurrent neural network; Two-level discriminative model;

D O I：

10.1016/j.comcom.2021.09.013

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Presently available speech emotion recognition (SER) methods generally rely on a single SER model. Getting a higher accuracy of SER involves feature extraction method and model design scheme in the speech. However, the generalization performance of models is typically poor because the emotional features of different speakers can vary substantially. The present work addresses this issue by applying a two-level discriminative model to the SER task. The first level places an individual speaker within a specific speaker group according to the speaker's characteristics. The second level constructs a personalized SER model for each group of speakers using the wave field dynamics model and a dual-channel general SER model. Two-level discriminative model are fused for implementing an ensemble learning scheme to achieve effective SER classification. The proposed method is demonstrated to provide higher SER accuracy in experiments based on interactive emotional dynamic motion capture (IEMOCAP) corpus and a custom-built SER corpus. In IEMOCAP corpus, the proposed model improves the recognition accuracy by 7%. In custom-built SER corpus, both masked and unmasked speakers is employed to demonstrate that the proposed method maintains higher SER accuracy.

引用

页码：161 / 170

页数：10

共 36 条

[11] Wave physics as an analog recurrent neural network
Hughes, Tyler W.
Williamson, Ian A. D.
Minkov, Momchil
Fan, Shanhui
[J]. SCIENCE ADVANCES, 2019, 5 (12):
[12] Juvela L, 2019, INT CONF ACOUST SPEE, P6915, DOI [10.1109/ICASSP.2019.8683271, 10.1109/icassp.2019.8683271]
[13] Juvela L, 2018, INTERSPEECH, P2012
[14] Efficient and effective strategies for cross-corpus acoustic emotion recognition
Kaya, Heysem
Karpov, Alexey A.
[J]. NEUROCOMPUTING, 2018, 275 : 1028 - 1034
[15] Keren G, 2016, IEEE IJCNN, P3412, DOI 10.1109/IJCNN.2016.7727636
[16] Deep Temporal Models using Identity Skip-Connections for Speech Emotion Recognition
Kim, Jaebok
Englebienne, Gwenn
Truong, Khiet P.
Evers, Vanessa
[J]. PROCEEDINGS OF THE 2017 ACM MULTIMEDIA CONFERENCE (MM'17), 2017, : 1006 - 1013
[17] Feature extraction algorithms to improve the speech emotion recognition rate
Koduru, Anusha
Valiveti, Hima Bindu
Budati, Anil Kumar
[J]. INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2020, 23 (01) : 45 - 55
[18] Korba M.C.A., 2018, NOISE ROBUST FEATURE, V456, P155
[19] Lakomkin E., 2018, REUSING NEURAL SPEEC
[20] Direct Modelling of Speech Emotion from Raw Speech
Latif, Siddique
Rana, Rajib
Khalifa, Sara
Jurdak, Raja
Epps, Julien
[J]. INTERSPEECH 2019, 2019, : 3920 - 3924

← 1 2 3 4 →