EMOCEPTION: AN INCEPTION INSPIRED EFFICIENT SPEECH EMOTION RECOGNITION NETWORK

被引:0
|
作者
Singh, Chirag [1 ]
Kumar, Abhay [1 ]
Nagar, Ajay [1 ]
Tripathi, Suraj [1 ]
Yenigalla, Promod [1 ]
机构
[1] Samsung R&D Inst India, Bangalore, Karnataka, India
关键词
Speech Emotion Recognition; Inception; Multi-Task Learning; CNN;
D O I
10.1109/asru46091.2019.9004020
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This research proposes a Deep Neural Network architecture for Speech Emotion Recognition called Emoception, which takes inspiration from Inception modules. The network takes speech features like Mel-Frequency Spectral Coefficients (MFSC) or Mel-Frequency Cepstral Coefficients (MFCC) as input and recognizes the relevant emotion in the speech. We use USC-IEMOCAP dataset for training but the limited amount of training data and large depth of the network makes the network prone to overfitting, reducing validation accuracy. The Emoception network overcomes this problem by extending in width without increase in computational cost. We also employ a powerful regularization technique, Multi-Task Learning (MTL) to make the network robust. The model using MFSC input with MTL increases the accuracy by 1.6% vis-a-vis Emoception without MTL. We report an overall accuracy improvement of around 4.6% compared to the existing state-of-art methods for four emotion classes on IEMOCAP dataset.
引用
收藏
页码:787 / 791
页数:5
相关论文
共 50 条
  • [31] Speech Emotion Recognition
    Lalitha, S.
    Madhavan, Abhishek
    Bhushan, Bharath
    Saketh, Srinivas
    2014 INTERNATIONAL CONFERENCE ON ADVANCES IN ELECTRONICS, COMPUTERS AND COMMUNICATIONS (ICAECC), 2014,
  • [32] Emotion Prompting for Speech Emotion Recognition
    Zhou, Xingfa
    Li, Min
    Yang, Lan
    Sun, Rui
    Wang, Xin
    Zhan, Huayi
    INTERSPEECH 2023, 2023, : 3108 - 3112
  • [33] An efficient speech emotion recognition based on a dual-stream CNN-transformer fusion network
    Tellai M.
    Gao L.
    Mao Q.
    International Journal of Speech Technology, 2023, 26 (02) : 541 - 557
  • [34] Temporal Relation Inference Network for Multimodal Speech Emotion Recognition
    Dong, Guan-Nan
    Pun, Chi-Man
    Zhang, Zheng
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (09) : 6472 - 6485
  • [35] A multi-dilated convolution network for speech emotion recognition
    Madanian, Samaneh
    Adeleye, Olayinka
    Templeton, John Michael
    Chen, Talen
    Poellabauer, Christian
    Zhang, Enshi
    Schneider, Sandra L.
    SCIENTIFIC REPORTS, 2025, 15 (01):
  • [36] Attention Based Fully Convolutional Network for Speech Emotion Recognition
    Zhang, Yuanyuan
    Du, Jun
    Wang, Zirui
    Zhang, Jianshu
    Tu, Yanhui
    2018 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2018, : 1771 - 1775
  • [37] Speech Emotion Recognition Using Neural Network and Wavelet Features
    Roy, Tanmoy
    Marwala, Tshilidzi
    Chakraverty, S.
    RECENT TRENDS IN WAVE MECHANICS AND VIBRATIONS, WMVC 2018, 2020, : 427 - 438
  • [38] A Speech Emotion Recognition Method Based on Lightweight Capsule Network
    Wang Y.
    Gao S.
    Dianzi Keji Daxue Xuebao/Journal of the University of Electronic Science and Technology of China, 2023, 52 (03): : 423 - 429
  • [39] An improved speech emotion recognition method based on RepVGG network
    Huang, Chuan-Bao
    Zhu, Kai
    Hu, Zhen
    PROCEEDINGS OF 2024 3RD INTERNATIONAL CONFERENCE ON CRYPTOGRAPHY, NETWORK SECURITY AND COMMUNICATION TECHNOLOGY, CNSCT 2024, 2024, : 451 - 457
  • [40] Relative Speech Emotion Recognition Based Artificial Neural Network
    Fu, Liqin
    Mao, Xia
    Chen, Lijiang
    PACIIA: 2008 PACIFIC-ASIA WORKSHOP ON COMPUTATIONAL INTELLIGENCE AND INDUSTRIAL APPLICATION, VOLS 1-3, PROCEEDINGS, 2008, : 1111 - 1115