SPEECH EMOTION RECOGNITION WITH GLOBAL-AWARE FUSION ON MULTI-SCALE FEATURE REPRESENTATION

被引:43
作者
Zhu, Wenjing [1 ]
Li, Xiang [1 ]
机构
[1] Du Xiaoman, Beijing, Peoples R China
来源
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2022年
关键词
Speech Emotion Recognition; Attention Mechanism; Multi-scale Features; Feature Fusion;
D O I
10.1109/ICASSP43922.2022.9747517
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Speech Emotion Recognition (SER) is a fundamental task to predict the emotion label from speech data. Recent works mostly focus on using convolutional neural networks (CNNs) to learn local attention map on fixed-scale feature representation by viewing time-varied spectral features as images. However, rich emotional feature at different scales and important global information are not able to be well captured due to the limits of existing CNNs for SER. In this paper, we propose a novel GLobal-Aware Multi-scale (GLAM) neural network(1) to learn multi-scale feature representation with global-aware fusion module to attend emotional information. Specifically, GLAM iteratively utilizes multiple convolutional kernels with different scales to learn multiple feature representation. Then, instead of using attention-based methods, a simple but effective global-aware fusion module is applied to grab most important emotional information globally. Experiments on the benchmark corpus IEMOCAP over four emotions demonstrates the superiority of our proposed model with 2.5% to 4.5% improvements on four common metrics compared to previous state-of-the-art approaches.
引用
收藏
页码:6437 / 6441
页数:5
相关论文
共 19 条
[1]  
Badshah AM, 2017, 2017 INTERNATIONAL CONFERENCE ON PLATFORM TECHNOLOGY AND SERVICE (PLATCON), P125
[2]   IEMOCAP: interactive emotional dyadic motion capture database [J].
Busso, Carlos ;
Bulut, Murtaza ;
Lee, Chi-Chun ;
Kazemzadeh, Abe ;
Mower, Emily ;
Kim, Samuel ;
Chang, Jeannette N. ;
Lee, Sungbok ;
Narayanan, Shrikanth S. .
LANGUAGE RESOURCES AND EVALUATION, 2008, 42 (04) :335-359
[3]  
Chen Mingyi, 2010, IEEE SIGNAL PROCESSI, V25, P1440
[4]   Emotion recognition and affective computing on vocal social media [J].
Dai, Weihui ;
Han, Dongmei ;
Dai, Yonghui ;
Xu, Dongrong .
INFORMATION & MANAGEMENT, 2015, 52 (07) :777-788
[5]   Survey on speech emotion recognition: Features, classification schemes, and databases [J].
El Ayadi, Moataz ;
Kamel, Mohamed S. ;
Karray, Fakhri .
PATTERN RECOGNITION, 2011, 44 (03) :572-587
[6]  
El Ayadi MMH, 2007, INT CONF ACOUST SPEE, P957
[7]  
Han Kun, 2014, 2014 ANN C INT SPEEC
[8]  
Li Pengcheng, 2018, 2018 ANN C INT SPEEC
[9]  
Liu Hanxiao, 1950, PAY ATTENTION MLPS
[10]  
Mustaqeem, 2020, SENSORS, V20