Speech Emotion Recognition using Context-Aware Dilated Convolution Network

被引:5
|
作者
Kakuba, Samuel [1 ]
Han, Dong Seog [2 ]
机构
[1] Kyungpook Natl Univ, Grad Sch Elect & Elect Engn, Daegu, South Korea
[2] Kyungpook Natl Univ, Sch Elect & Elect Engn, Daegu, South Korea
来源
2022 27TH ASIA PACIFIC CONFERENCE ON COMMUNICATIONS (APCC 2022): CREATING INNOVATIVE COMMUNICATION TECHNOLOGIES FOR POST-PANDEMIC ERA | 2022年
关键词
context-aware emotion recognition; multi-head attention; dilated convolution;
D O I
10.1109/APCC55198.2022.9943771
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Deep learning-based speech emotion recognition has been applied for social living assistance, health monitoring, authentication, and other human-to-machine interaction applications. Because of the ubiquitous nature of the applications, computationally efficient and robust speech emotion recognition models are required. The nature of the speech signal requires tracking of time steps, analyzing long-term dependencies and the contexts of the utterances as well as the spatial cues. Recurrent neural networks like long short-term memory and gated recurrent units coupled with attention mechanisms are often used to consider long-term dependencies and context in the speech signal. However, they do not take care of the spatial cues that may exist in the speech signal. Moreover, the operation of most of these systems is sequential which causes slow convergence, and sluggish training. Therefore, we propose a model that employs dilated convolutions layers in combination with hybrid attention mechanisms. The model uses multi-head attention to extract the global context in the feature representations which are fed into the bidirectional long short-term memory configured with self-attention to further handle the context and long-term dependencies. The model uses spectral and voice quality features extracted from the raw speech signals as input. The proposed model achieves comparable performance in terms of F1 score and accuracy. The proposed model's performance is also presented in terms of confusion matrices.
引用
收藏
页码:601 / 604
页数:4
相关论文
共 50 条
  • [31] BENet: A Lightweight Bottom-Up Framework for Context-Aware Emotion Recognition
    Cladiere, Tristan
    Alata, Olivier
    Ducottet, Christophe
    Konik, Hubert
    Legrand, Anne-Claire
    ADVANCED CONCEPTS FOR INTELLIGENT VISION SYSTEMS, ACIVS 2023, 2023, 14124 : 100 - 111
  • [32] TimeBird: Context-Aware Graph Convolution Network for Traffic Incident Duration Prediction
    Sun, Fuyong
    Gao, Ruipeng
    Xing, Weiwei
    Zhang, Yaoxue
    Lu, Wei
    Fang, Jun
    Liu, Shui
    Ma, Nan
    Chai, Hua
    WIRELESS ALGORITHMS, SYSTEMS, AND APPLICATIONS (WASA 2022), PT I, 2022, 13471 : 185 - 195
  • [33] TimeBird: Context-Aware Graph Convolution Network for Traffic Incident Duration Prediction
    Sun, Fuyong
    Gao, Ruipeng
    Xing, Weiwei
    Zhang, Yaoxue
    Lu, Wei
    Fang, Jun
    Liu, Shui
    Ma, Nan
    Chai, Hua
    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2022, 13471 LNCS : 185 - 195
  • [34] Context-Aware Based Visual-Audio Feature Fusion for Emotion Recognition
    Cheng, Huijie
    Tie, Yun
    Qi, Lin
    Jin, Cong
    2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
  • [35] Semantic Valence Modeling Emotion Recognition and Affective States in Context-Aware Systems
    Moore, Philip
    Xhafa, Fatos
    Barolli, Leonard
    2014 28TH INTERNATIONAL CONFERENCE ON ADVANCED INFORMATION NETWORKING AND APPLICATIONS WORKSHOPS (WAINA), 2014, : 536 - 541
  • [36] Evaluating significant features in context-aware multimodal emotion recognition with XAI methods
    Khalane, Aaishwarya
    Makwana, Rikesh
    Shaikh, Talal
    Ullah, Abrar
    EXPERT SYSTEMS, 2025, 42 (01)
  • [37] Context-aware Cascade Attention-based RNN for Video Emotion Recognition
    Sun, Man-Chin
    Hsu, Shih-Huan
    Yang, Min-Chun
    Chien, Jen-Hsien
    2018 FIRST ASIAN CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION (ACII ASIA), 2018,
  • [38] Context-Aware Facial Expression Recognition Using Deep Convolutional Neural Network Architecture
    Jain, Abha
    Nigam, Swati
    Singh, Rajiv
    INTELLIGENT HUMAN COMPUTER INTERACTION, IHCI 2023, PT I, 2024, 14531 : 127 - 139
  • [39] Context-Aware Emotion Recognition in the Wild Using Spatio-Temporal and Temporal-Pyramid Models
    Do, Nhu-Tai
    Kim, Soo-Hyung
    Yang, Hyung-Jeong
    Lee, Guee-Sang
    Yeom, Soonja
    SENSORS, 2021, 21 (07)
  • [40] Attention-Based Multi-Learning Approach for Speech Emotion Recognition With Dilated Convolution
    Kakuba, Samuel
    Poulose, Alwin
    Han, Dong Seog
    IEEE ACCESS, 2022, 10 : 122302 - 122313