Speech Emotion Recognition via Generation using an Attention-based Variational Recurrent Neural Network

被引:10
|
作者
Baruah, Murchana [1 ]
Banerjee, Bonny
机构
[1] Univ Memphis, Inst Intelligent Syst, Memphis, TN 38152 USA
来源
INTERSPEECH 2022 | 2022年
关键词
Speech emotion recognition; recognition by generation; variational RNN; MFCC; attention; active inference; predictive coding; FEATURES;
D O I
10.21437/Interspeech.2022-753
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
The last decade has seen an exponential rise in the number of attention-based models for speech emotion recognition (SER). Most of these models use a spectrogram as the input speech representation and the CNN or RNN or convolutional RNN as the key machine learning (ML) component, and learn feature weights to implement attention. We propose an attention-based model for SER that uses MFCC as the input speech representation and a variational RNN (VRNN) as the key ML component. Since the MFCC is of lower dimension than a spectrogram, the model is size- and data-efficient. The VRNN has been used for problems in vision but rarely for SER. Our model is predictive in nature. At each instant, it infers the emotion class and generates the next observation, computes the generation error, and selectively samples (attends to) the locations of high error. Thus, attention emerges in our model, and does not require learning feature weights. This simple model provides interesting insights when evaluated for SER on benchmark datasets. The model can operate on variable length and infinite duration audio files. This work is the first to explore simultaneous generation and recognition for SER, where the generation capability is necessary for efficient recognition.
引用
收藏
页码:4710 / 4714
页数:5
相关论文
共 50 条
  • [21] Hierarchical Spatiotemporal Electroencephalogram Feature Learning and Emotion Recognition With Attention-Based Antagonism Neural Network
    Zhang, Pengwei
    Min, Chongdan
    Zhang, Kangjia
    Xue, Wen
    Chen, Jingxia
    FRONTIERS IN NEUROSCIENCE, 2021, 15
  • [22] ATTENTION-BASED SPEECH RECOGNITION USING GAZE INFORMATION
    Segawa, Osamu
    Hayashi, Tomoki
    Takeda, Kazuya
    2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 465 - 470
  • [23] Speech Emotion Recognition with Hybrid Neural Network
    Wei, Chuanzheng
    Sun, Xiao
    Tian, Fang
    Ren, Fuji
    5TH INTERNATIONAL CONFERENCE ON BIG DATA COMPUTING AND COMMUNICATIONS (BIGCOM 2019), 2019, : 298 - 302
  • [24] Attention gated tensor neural network architectures for speech emotion recognition
    Pandey, Sandeep Kumar
    Shekhawat, Hanumant Singh
    Prasanna, S. R. M.
    BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2022, 71
  • [25] Multiple attention convolutional-recurrent neural networks for speech emotion recognition
    Zhang, Zhihao
    Wang, Kunxia
    2022 10TH INTERNATIONAL CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION WORKSHOPS AND DEMOS, ACIIW, 2022,
  • [26] Upgraded Attention-Based Local Feature Learning Block for Speech Emotion Recognition
    Zhao, Huan
    Gao, Yingxue
    Xiao, Yufeng
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2021, PT II, 2021, 12713 : 118 - 130
  • [27] A novel dual attention-based BLSTM with hybrid features in speech emotion recognition
    Chen, Qiupu
    Huang, Guimin
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2021, 102
  • [28] Speech Emotion Recognition Using 3D Convolutions and Attention-Based Sliding Recurrent Networks With Auditory Front-Ends
    Peng, Zhichao
    Li, Xingfeng
    Zhu, Zhi
    Unoki, Masashi
    Dang, Jianwu
    Akagi, Masato
    IEEE ACCESS, 2020, 8 : 16560 - 16572
  • [29] A Joint Network Based on Interactive Attention for Speech Emotion Recognition
    Hu, Ying
    Hou, Shijing
    Yang, Huamin
    Huang, Hao
    He, Liang
    2023 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, ICME, 2023, : 1715 - 1720
  • [30] The Application of Capsule Neural Network Based CNN for Speech Emotion Recognition
    Wen, Xin-Cheng
    Liu, Kun-Hong
    Zhang, Wei-Ming
    Jiang, Kai
    2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 9356 - 9362