TWACapsNet: a capsule network with two-way attention mechanism for speech emotion recognition

被引：0

作者：

Wen, Xin-Cheng ^{[1
]}

Liu, Kun-Hong ^{[2
]}

Luo, Yan ^{[3
]}

Ye, Jiaxin ^{[4
]}

Chen, Liyan ^{[2
]}

机构：

[1] Harbin Inst Technol Shenzhen, Dept Comp Sci, Shenzhen, Peoples R China

[2] Xiamen Univ, Sch Film, Xiamen, Peoples R China

[3] Peking Univ, Sch Software & Microelect, Beijing, Peoples R China

[4] Fudan Univ, Inst Sci & Technol Brain inspired Intelligence, Shanghai, Peoples R China

来源：

SOFT COMPUTING | 2023年 / 28卷 / 15-16期

基金：

中国国家自然科学基金;

关键词：

Speech emotion recognition; Attention mechanism; Neural networks; FEATURES; CNN;

D O I：

10.1007/s00500-023-08957-5

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Speech Emotion Recognition (SER) is a challenging task, and the typical convolutional neural network (CNN) cannot well handle the speech data directly. Because CNN tends to understand local information and ignores the overall characteristics. This paper proposes a Capsule Network with Two-Way Attention MechanismTWACapsNet for short) for the SER problem. TWACapsNet accepts the spatial and spectral features as inputs, and the convolutional layer and the capsule layer are deployed to process these two types of features in two ways separately. After that, two attention mechanisms are designed to enhance the information obtained from the spatial and spectral features. Finally, the results of these two ways are combined to form the final decision. The advantage of TWACapsNet is verified by experiments on multiple SER data sets, and experimental results show that the proposed method outperforms the widely-deployed neural network models on three typical SER data sets. Furthermore, the combination of the two ways contributes to the higher and more stable performance of TWACapsNet.

引用

页码：8701 / 8713

页数：13

共 50 条

[41] Pyramid Memory Block and Timestep Attention for Speech Emotion Recognition
Gao, Miao
Yang, Chun
Zhou, Fang
Yin, Xu-cheng
INTERSPEECH 2019, 2019, : 3930 - 3934
[42] Improve Accuracy of Speech Emotion Recognition with Attention Head Fusion
Xu, Mingke
Zhang, Fan
Khan, Samee U.
2020 10TH ANNUAL COMPUTING AND COMMUNICATION WORKSHOP AND CONFERENCE (CCWC), 2020, : 1058 - 1064
[43] Emotion Recognition via Multiscale Feature Fusion Network and Attention Mechanism
Jiang, Yiye
Xie, Songyun
Xie, Xinzhou
Cui, Yujie
Tang, Hao
IEEE SENSORS JOURNAL, 2023, 23 (10) : 10790 - 10800
[44] End-to-End Speech Emotion Recognition Based on Neural Network
Zhu, Bing
Zhou, Wenkai
Wang, Yutian
Wang, Hui
Cai, Juan Juan
2017 17TH IEEE INTERNATIONAL CONFERENCE ON COMMUNICATION TECHNOLOGY (ICCT 2017), 2017, : 1634 - 1638
[45] Speech emotion recognition using the novel PEmoNet (Parallel Emotion Network)
Bhangale, Kishor B.
Kothandaraman, Mohanaprasad
APPLIED ACOUSTICS, 2023, 212
[46] Speech Emotion Recognition via Generation using an Attention-based Variational Recurrent Neural Network
Baruah, Murchana
Banerjee, Bonny
INTERSPEECH 2022, 2022, : 4710 - 4714
[47] A bimodal network based on Audio-Text-Interactional-Attention with ArcFace loss for speech emotion recognition
Tang, Yuwu
Hu, Ying
He, Liang
Huang, Hao
SPEECH COMMUNICATION, 2022, 143 : 21 - 32
[48] Multi-task coordinate attention gating network for speech emotion recognition under noisy circumstances
Sun, Linhui
Lei, Yunlong
Zhang, Zixiao
Tang, Yi
Wang, Jing
Ye, Lei
Li, Pingan
BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2025, 107
[49] ISNet: Individual Standardization Network for Speech Emotion Recognition
Fan, Weiquan
Xu, Xiangmin
Cai, Bolun
Xing, Xiaofen
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2022, 30 : 1803 - 1814
[50] STUDY OF DENSE NETWORK APPROACHES FOR SPEECH EMOTION RECOGNITION
Abdelwahab, Mohammed
Busso, Carlos
2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5084 - 5088

← 1 2 3 4 5 →