Speech Emotion Recognition Using Speech Feature and Word Embedding

被引:0
|
作者
Atmaja, Bagus Tris [1 ,2 ]
Shirai, Kiyoaki [2 ]
Akagi, Masato [2 ]
机构
[1] Inst Teknol Sepuluh Nopember, Surabaya, Indonesia
[2] Japan Adv Inst Sci & Technol, Nomi, Japan
关键词
D O I
暂无
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Emotion recognition can be performed automatically from many modalities. This paper presents a categorical speech emotion recognition using speech feature and word embedding. Text features can be combined with speech features to improve emotion recognition accuracy, and both features can be obtained from speech. Here, we use speech segments, by removing silences in an utterance, where the acoustic feature is extracted for speech-based emotion recognition. Word embedding is used as an input feature for text emotion recognition and a combination of both features is proposed for performance improvement purpose. Two unidirectional LSTM layers are used for text and fully connected layers are applied for acoustic emotion recognition. Both networks then are merged by fully connected networks in early fusion way to produce one of four predicted emotion categories. The result shows the combination of speech and text achieve higher accuracy i.e. 75.49% compared to speech only with 58.29% or text only emotion recognition with 68.01%. This result also outperforms the previously proposed methods by others using the same dataset on the same modalities.
引用
收藏
页码:519 / 523
页数:5
相关论文
共 50 条
  • [31] Speech emotion recognition based on hierarchical attributes using feature nets
    Zhao, Huijuan
    Ye, Ning
    Wang, Ruchuan
    INTERNATIONAL JOURNAL OF PARALLEL EMERGENT AND DISTRIBUTED SYSTEMS, 2020, 35 (03) : 354 - 364
  • [32] Emotion and Word Recognition for Unprocessed and Vocoded Speech Stimuli
    Morgan, Shae D.
    Garrard, Stacy
    Hoskins, Tiffany
    EAR AND HEARING, 2022, 43 (02): : 398 - 407
  • [33] Emotion Recognition Using Multi-parameter Speech Feature Classification
    Poorna, S. S.
    Jeevitha, C. Y.
    Nair, Shyama Jayan
    Santhosh, Sini
    Nair, G. J.
    2015 INTERNATIONAL CONFERENCE ON COMPUTERS, COMMUNICATIONS, AND SYSTEMS (ICCCS), 2015, : 217 - 222
  • [34] Multi-modal Emotion Recognition using Speech Features and Text Embedding
    Kim J.-H.
    Lee S.-P.
    Transactions of the Korean Institute of Electrical Engineers, 2021, 70 (01): : 108 - 113
  • [35] Emotion Recognition using Imperfect Speech Recognition
    Metze, Florian
    Batliner, Anton
    Eyben, Florian
    Polzehl, Tim
    Schuller, Bjoern
    Steidl, Stefan
    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, 2010, : 478 - +
  • [36] Statistical feature selection for mandarin speech emotion recognition
    Xie, B
    Chen, L
    Chen, GC
    Chen, C
    ADVANCES IN INTELLIGENT COMPUTING, PT 1, PROCEEDINGS, 2005, 3644 : 591 - 600
  • [37] Harmony search for feature selection in speech emotion recognition
    Tao, Yongsen
    Wang, Kunxia
    Yang, Jing
    An, Ning
    Li, Lian
    2015 INTERNATIONAL CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION (ACII), 2015, : 362 - 367
  • [38] Speech Emotion Recognition based on Multiple Feature Fusion
    Jiang, Changjiang
    Mao, Rong
    Liu, Geng
    Wang, Mingyi
    2019 CHINESE AUTOMATION CONGRESS (CAC2019), 2019, : 907 - 912
  • [39] COMBINING FEATURE SELECTION AND REPRESENTATION FOR SPEECH EMOTION RECOGNITION
    Han, Wenjing
    Ruan, Huabin
    Yu, Xiaojie
    Zhu, Xuan
    2016 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA & EXPO WORKSHOPS (ICMEW), 2016,
  • [40] Speech emotion recognition based on time domain feature
    Zhao, Lasheng
    Wei, Xiaopeng
    Zhang, Qiang
    PROCEEDINGS OF THE INTERNATIONAL CONFERENCE INFORMATION COMPUTING AND AUTOMATION, VOLS 1-3, 2008, : 1319 - 1321