Speech Emotion Recognition using Context-Aware Dilated Convolution Network

被引:5
|
作者
Kakuba, Samuel [1 ]
Han, Dong Seog [2 ]
机构
[1] Kyungpook Natl Univ, Grad Sch Elect & Elect Engn, Daegu, South Korea
[2] Kyungpook Natl Univ, Sch Elect & Elect Engn, Daegu, South Korea
来源
2022 27TH ASIA PACIFIC CONFERENCE ON COMMUNICATIONS (APCC 2022): CREATING INNOVATIVE COMMUNICATION TECHNOLOGIES FOR POST-PANDEMIC ERA | 2022年
关键词
context-aware emotion recognition; multi-head attention; dilated convolution;
D O I
10.1109/APCC55198.2022.9943771
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Deep learning-based speech emotion recognition has been applied for social living assistance, health monitoring, authentication, and other human-to-machine interaction applications. Because of the ubiquitous nature of the applications, computationally efficient and robust speech emotion recognition models are required. The nature of the speech signal requires tracking of time steps, analyzing long-term dependencies and the contexts of the utterances as well as the spatial cues. Recurrent neural networks like long short-term memory and gated recurrent units coupled with attention mechanisms are often used to consider long-term dependencies and context in the speech signal. However, they do not take care of the spatial cues that may exist in the speech signal. Moreover, the operation of most of these systems is sequential which causes slow convergence, and sluggish training. Therefore, we propose a model that employs dilated convolutions layers in combination with hybrid attention mechanisms. The model uses multi-head attention to extract the global context in the feature representations which are fed into the bidirectional long short-term memory configured with self-attention to further handle the context and long-term dependencies. The model uses spectral and voice quality features extracted from the raw speech signals as input. The proposed model achieves comparable performance in terms of F1 score and accuracy. The proposed model's performance is also presented in terms of confusion matrices.
引用
收藏
页码:601 / 604
页数:4
相关论文
共 50 条
  • [1] CONTEXT-AWARE ATTENTION MECHANISM FOR SPEECH EMOTION RECOGNITION
    Ramet, Gaetan
    Garner, Philip N.
    Baeriswyl, Michael
    Lazaridis, Alexandros
    2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), 2018, : 126 - 131
  • [2] A multi-dilated convolution network for speech emotion recognition
    Madanian, Samaneh
    Adeleye, Olayinka
    Templeton, John Michael
    Chen, Talen
    Poellabauer, Christian
    Zhang, Enshi
    Schneider, Sandra L.
    SCIENTIFIC REPORTS, 2025, 15 (01):
  • [3] End-to-end speech emotion recognition using a novel context-stacking dilated convolution neural network
    Tang, Duowei
    Kuppens, Peter
    Geurts, Luc
    van Waterschoot, Toon
    EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2021, 2021 (01)
  • [4] End-to-end speech emotion recognition using a novel context-stacking dilated convolution neural network
    Duowei Tang
    Peter Kuppens
    Luc Geurts
    Toon van Waterschoot
    EURASIP Journal on Audio, Speech, and Music Processing, 2021
  • [5] Context-Aware Attention Network for Human Emotion Recognition in Video
    Liu, Xiaodong
    Wang, Miao
    ADVANCES IN MULTIMEDIA, 2020, 2020
  • [6] Sequential Interactive Biased Network for Context-Aware Emotion Recognition
    Li, Xinpeng
    Peng, Xiaojiang
    Ding, Changxing
    2021 INTERNATIONAL JOINT CONFERENCE ON BIOMETRICS (IJCB 2021), 2021,
  • [7] Context-Aware Emotion Recognition Networks
    Lee, Jiyoung
    Kim, Seungryong
    Kim, Sunok
    Park, Jungin
    Sohn, Kwanghoon
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 10142 - 10151
  • [8] Context-aware Multimodal Fusion for Emotion Recognition
    Li, Jinchao
    Wang, Shuai
    Chao, Yang
    Liu, Xunying
    Meng, Helen
    INTERSPEECH 2022, 2022, : 2013 - 2017
  • [9] Context-Aware Speech Recognition Using Prompts for Language Learners
    Cheng, Jian
    INTERSPEECH 2024, 2024, : 4009 - 4013
  • [10] VISUAL FEATURES FOR CONTEXT-AWARE SPEECH RECOGNITION
    Gupta, Abhinav
    Miao, Yajie
    Neves, Leonardo
    Metze, Florian
    2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 5020 - 5024