Speech Emotion Recognition using Context-Aware Dilated Convolution Network

被引：5

作者：

Kakuba, Samuel ^{[1
]}

Han, Dong Seog ^{[2
]}

机构：

[1] Kyungpook Natl Univ, Grad Sch Elect & Elect Engn, Daegu, South Korea

[2] Kyungpook Natl Univ, Sch Elect & Elect Engn, Daegu, South Korea

来源：

2022 27TH ASIA PACIFIC CONFERENCE ON COMMUNICATIONS (APCC 2022): CREATING INNOVATIVE COMMUNICATION TECHNOLOGIES FOR POST-PANDEMIC ERA | 2022年

关键词：

context-aware emotion recognition; multi-head attention; dilated convolution;

D O I：

10.1109/APCC55198.2022.9943771

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Deep learning-based speech emotion recognition has been applied for social living assistance, health monitoring, authentication, and other human-to-machine interaction applications. Because of the ubiquitous nature of the applications, computationally efficient and robust speech emotion recognition models are required. The nature of the speech signal requires tracking of time steps, analyzing long-term dependencies and the contexts of the utterances as well as the spatial cues. Recurrent neural networks like long short-term memory and gated recurrent units coupled with attention mechanisms are often used to consider long-term dependencies and context in the speech signal. However, they do not take care of the spatial cues that may exist in the speech signal. Moreover, the operation of most of these systems is sequential which causes slow convergence, and sluggish training. Therefore, we propose a model that employs dilated convolutions layers in combination with hybrid attention mechanisms. The model uses multi-head attention to extract the global context in the feature representations which are fed into the bidirectional long short-term memory configured with self-attention to further handle the context and long-term dependencies. The model uses spectral and voice quality features extracted from the raw speech signals as input. The proposed model achieves comparable performance in terms of F1 score and accuracy. The proposed model's performance is also presented in terms of confusion matrices.

引用

页码：601 / 604

页数：4

共 50 条

[1] CONTEXT-AWARE ATTENTION MECHANISM FOR SPEECH EMOTION RECOGNITION
Ramet, Gaetan
Garner, Philip N.
Baeriswyl, Michael
Lazaridis, Alexandros
2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), 2018, : 126 - 131
[2] A multi-dilated convolution network for speech emotion recognition
Madanian, Samaneh
Adeleye, Olayinka
Templeton, John Michael
Chen, Talen
Poellabauer, Christian
Zhang, Enshi
Schneider, Sandra L.
SCIENTIFIC REPORTS, 2025, 15 (01):
[3] End-to-end speech emotion recognition using a novel context-stacking dilated convolution neural network
Tang, Duowei
Kuppens, Peter
Geurts, Luc
van Waterschoot, Toon
EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2021, 2021 (01)
[4] End-to-end speech emotion recognition using a novel context-stacking dilated convolution neural network
Duowei Tang
Peter Kuppens
Luc Geurts
Toon van Waterschoot
EURASIP Journal on Audio, Speech, and Music Processing, 2021
[5] Context-Aware Attention Network for Human Emotion Recognition in Video
Liu, Xiaodong
Wang, Miao
ADVANCES IN MULTIMEDIA, 2020, 2020
[6] Sequential Interactive Biased Network for Context-Aware Emotion Recognition
Li, Xinpeng
Peng, Xiaojiang
Ding, Changxing
2021 INTERNATIONAL JOINT CONFERENCE ON BIOMETRICS (IJCB 2021), 2021,
[7] Context-Aware Emotion Recognition Networks
Lee, Jiyoung
Kim, Seungryong
Kim, Sunok
Park, Jungin
Sohn, Kwanghoon
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 10142 - 10151
[8] Context-aware Multimodal Fusion for Emotion Recognition
Li, Jinchao
Wang, Shuai
Chao, Yang
Liu, Xunying
Meng, Helen
INTERSPEECH 2022, 2022, : 2013 - 2017
[9] Context-Aware Speech Recognition Using Prompts for Language Learners
Cheng, Jian
INTERSPEECH 2024, 2024, : 4009 - 4013
[10] VISUAL FEATURES FOR CONTEXT-AWARE SPEECH RECOGNITION
Gupta, Abhinav
Miao, Yajie
Neves, Leonardo
Metze, Florian
2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 5020 - 5024

← 1 2 3 4 5 →