TWACapsNet: a capsule network with two-way attention mechanism for speech emotion recognition

被引：0

作者：

Wen, Xin-Cheng ^{[1
]}

Liu, Kun-Hong ^{[2
]}

Luo, Yan ^{[3
]}

Ye, Jiaxin ^{[4
]}

Chen, Liyan ^{[2
]}

机构：

[1] Harbin Inst Technol Shenzhen, Dept Comp Sci, Shenzhen, Peoples R China

[2] Xiamen Univ, Sch Film, Xiamen, Peoples R China

[3] Peking Univ, Sch Software & Microelect, Beijing, Peoples R China

[4] Fudan Univ, Inst Sci & Technol Brain inspired Intelligence, Shanghai, Peoples R China

来源：

SOFT COMPUTING | 2023年 / 28卷 / 15-16期

基金：

中国国家自然科学基金;

关键词：

Speech emotion recognition; Attention mechanism; Neural networks; FEATURES; CNN;

D O I：

10.1007/s00500-023-08957-5

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Speech Emotion Recognition (SER) is a challenging task, and the typical convolutional neural network (CNN) cannot well handle the speech data directly. Because CNN tends to understand local information and ignores the overall characteristics. This paper proposes a Capsule Network with Two-Way Attention MechanismTWACapsNet for short) for the SER problem. TWACapsNet accepts the spatial and spectral features as inputs, and the convolutional layer and the capsule layer are deployed to process these two types of features in two ways separately. After that, two attention mechanisms are designed to enhance the information obtained from the spatial and spectral features. Finally, the results of these two ways are combined to form the final decision. The advantage of TWACapsNet is verified by experiments on multiple SER data sets, and experimental results show that the proposed method outperforms the widely-deployed neural network models on three typical SER data sets. Furthermore, the combination of the two ways contributes to the higher and more stable performance of TWACapsNet.

引用

页码：8701 / 8713

页数：13

共 50 条

[31] Hybrid LSTM-Attention and CNN Model for Enhanced Speech Emotion Recognition
Makhmudov, Fazliddin
Kutlimuratov, Alpamis
Cho, Young-Im
APPLIED SCIENCES-BASEL, 2024, 14 (23):
[32] A speech emotion recognition method for the elderly based on feature fusion and attention mechanism
Jian, Qijian
Xiang, Min
Huang, Wei
THIRD INTERNATIONAL CONFERENCE ON ELECTRONICS AND COMMUNICATION; NETWORK AND COMPUTER TECHNOLOGY (ECNCT 2021), 2022, 12167
[33] Learning Temporal Clusters Using Capsule Routing for Speech Emotion Recognition
Jalal, Md Asif
Loweimi, Erfan
Moore, Roger K.
Hain, Thomas
INTERSPEECH 2019, 2019, : 1701 - 1705
[34] Improved ShuffleNet V2 network with attention for speech emotion recognition
Udeh, Chinonso Paschal
Chen, Luefeng
Du, Sheng
Liu, Yulong
Li, Min
Wu, Min
INFORMATION SCIENCES, 2025, 689
[35] MULTIMODAL CROSS- AND SELF-ATTENTION NETWORK FOR SPEECH EMOTION RECOGNITION
Sun, Licai
Liu, Bin
Tao, Jianhua
Lian, Zheng
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 4275 - 4279
[36] Self-attention for Speech Emotion Recognition
Tarantino, Lorenzo
Garner, Philip N.
Lazaridis, Alexandros
INTERSPEECH 2019, 2019, : 2578 - 2582
[37] SPEECH EMOTION RECOGNITION WITH MULTISCALE AREA ATTENTION AND DATA AUGMENTATION
Xu, Mingke
Zhang, Fan
Cui, Xiaodong
Zhang, Wei
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6319 - 6323
[38] Attention-Based Dense LSTM for Speech Emotion Recognition
Xie, Yue
Liang, Ruiyu
Liang, Zhenlin
Zhao, Li
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2019, E102D (07): : 1426 - 1429
[39] Speech Emotion Recognition using XGBoost and CNN BLSTM with Attention
He, Jingru
Ren, Liyong
2021 IEEE SMARTWORLD, UBIQUITOUS INTELLIGENCE & COMPUTING, ADVANCED & TRUSTED COMPUTING, SCALABLE COMPUTING & COMMUNICATIONS, INTERNET OF PEOPLE, AND SMART CITY INNOVATIONS (SMARTWORLD/SCALCOM/UIC/ATC/IOP/SCI 2021), 2021, : 154 - 159
[40] Spatiotemporal and frequential cascaded attention networks for speech emotion recognition
Li, Shuzhen
Xing, Xiaofen
Fan, Weiquan
Cai, Bolun
Fordson, Perry
Xu, Xiangmin
NEUROCOMPUTING, 2021, 448 : 238 - 248

← 1 2 3 4 5 →