Speech Emotion Recognition via Generation using an Attention-based Variational Recurrent Neural Network

被引:10
作者
Baruah, Murchana [1 ]
Banerjee, Bonny
机构
[1] Univ Memphis, Inst Intelligent Syst, Memphis, TN 38152 USA
来源
INTERSPEECH 2022 | 2022年
关键词
Speech emotion recognition; recognition by generation; variational RNN; MFCC; attention; active inference; predictive coding; FEATURES;
D O I
10.21437/Interspeech.2022-753
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
The last decade has seen an exponential rise in the number of attention-based models for speech emotion recognition (SER). Most of these models use a spectrogram as the input speech representation and the CNN or RNN or convolutional RNN as the key machine learning (ML) component, and learn feature weights to implement attention. We propose an attention-based model for SER that uses MFCC as the input speech representation and a variational RNN (VRNN) as the key ML component. Since the MFCC is of lower dimension than a spectrogram, the model is size- and data-efficient. The VRNN has been used for problems in vision but rarely for SER. Our model is predictive in nature. At each instant, it infers the emotion class and generates the next observation, computes the generation error, and selectively samples (attends to) the locations of high error. Thus, attention emerges in our model, and does not require learning feature weights. This simple model provides interesting insights when evaluated for SER on benchmark datasets. The model can operate on variable length and infinite duration audio files. This work is the first to explore simultaneous generation and recognition for SER, where the generation capability is necessary for efficient recognition.
引用
收藏
页码:4710 / 4714
页数:5
相关论文
共 50 条
  • [31] Speech Emotion Recognition Using 3D Convolutions and Attention-Based Sliding Recurrent Networks With Auditory Front-Ends
    Peng, Zhichao
    Li, Xingfeng
    Zhu, Zhi
    Unoki, Masashi
    Dang, Jianwu
    Akagi, Masato
    IEEE ACCESS, 2020, 8 : 16560 - 16572
  • [32] Attention-based deep neural network for driver behavior recognition
    Xiao, Weichu
    Liu, Hongli
    Ma, Ziji
    Chen, Weihong
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2022, 132 : 152 - 161
  • [33] Convolution neural network based automatic speech emotion recognition using Mel-frequency Cepstrum coefficients
    Pawar, Manju D.
    Kokate, Rajendra D.
    MULTIMEDIA TOOLS AND APPLICATIONS, 2021, 80 (10) : 15563 - 15587
  • [34] Sequence-to-sequence Modelling for Categorical Speech Emotion Recognition Using Recurrent Neural Network
    Chen, Xiaomin
    Han, Wenjing
    Ruan, Huabin
    Liu, Jiamu
    Li, Haifeng
    Jiang, Dongmei
    2018 FIRST ASIAN CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION (ACII ASIA), 2018,
  • [35] High-level Feature Representation using Recurrent Neural Network for Speech Emotion Recognition
    Lee, Jinkyu
    Tashev, Ivan
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 1537 - 1540
  • [36] SPEECH EMOTION RECOGNITION USING COMBINED TRIPLET AND SINGLE-HEADED ATTENTION-ENABLED INTEGRATED RECURRENT NETWORK
    Rajput, Vaishali
    Musale, Vinayak
    Mohite, Sagar
    Amune, Amruta
    Jadhav, Swati
    BIOMEDICAL ENGINEERING-APPLICATIONS BASIS COMMUNICATIONS, 2024,
  • [37] Speech emotion recognition based on spiking neural network and convolutional neural network
    Du, Chengyan
    Liu, Fu
    Kang, Bing
    Hou, Tao
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2025, 147
  • [38] IMPROVING CONVOLUTIONAL RECURRENT NEURAL NETWORKS FOR SPEECH EMOTION RECOGNITION
    Meyer, Patrick
    Xu, Ziyi
    Fingscheidt, Tim
    2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), 2021, : 365 - 372
  • [39] Speech Emotion Recognition Using Generative Adversarial Network and Deep Convolutional Neural Network
    Bhangale, Kishor
    Kothandaraman, Mohanaprasad
    CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2024, 43 (04) : 2341 - 2384
  • [40] Speech Emotion Recognition Using Convolutional Neural Networks with Attention Mechanism
    Mountzouris, Konstantinos
    Perikos, Isidoros
    Hatzilygeroudis, Ioannis
    Corchado, Juan M.
    Iglesias, Carlos A.
    Kim, Byung-Gyu
    Mehmood, Rashid
    Ren, Fuji
    Lee, In
    ELECTRONICS, 2023, 12 (20)