Multi-modal Emotion Recognition using Speech Features and Text Embedding

被引:0
|
作者
Kim J.-H. [1 ]
Lee S.-P. [2 ]
机构
[1] Dept. of Computer Science, Sangmyung University
[2] Dept. of Electronic Engineering, Sangmyung University
关键词
Speech emotion recognition. emotion recognition. multi-modal emotion recognition. deep learning;
D O I
10.5370/KIEE.2021.70.1.108
中图分类号
学科分类号
摘要
Many studies have been conducted emotion recognition using audio signals as it is easy to collect. However, the accuracy is lower than other methods such as using facial images or video signals. In this paper, we propose an emotion recognition using speech signals and text simultaneously to achieve better performance. For training, we generate 43 feature vectors like mfcc, spectral features and harmonic features from audio data. Also 256 embedding vectors is extracted from text data using pretrained Tacotron encoder. Feature vectors and text embedding vectors are fed into each LSTM layer and frilly connected layer which produces a probability distribution over predicted output classes. By combining the average of both results, the data is assigned to one of four emotion categories : Anger, happiness, sadness, neutrality. Our proposed model outperforms previous state-of-the-art methods when they use Korean emotional speech dataset. © 2021 Korean Institute of Electrical Engineers. All rights reserved.
引用
收藏
页码:108 / 113
页数:5
相关论文
共 50 条
  • [1] Multi-Modal Emotion Recognition Using Speech Features and Text-Embedding
    Byun, Sung-Woo
    Kim, Ju-Hee
    Lee, Seok-Pil
    APPLIED SCIENCES-BASEL, 2021, 11 (17):
  • [2] Implementation of Multi-modal Speech Emotion Recognition Using Text Data and Audio Signals
    Adesola, Falade
    Adeyinka, Omirinlewo
    Kayode, Akindeji
    Ayodele, Adebiyi
    2023 International Conference on Science, Engineering and Business for Sustainable Development Goals, SEB-SDG 2023, 2023,
  • [3] Multi-modal Attention for Speech Emotion Recognition
    Pan, Zexu
    Luo, Zhaojie
    Yang, Jichen
    Li, Haizhou
    INTERSPEECH 2020, 2020, : 364 - 368
  • [4] Multi-modal emotion recognition using EEG and speech signals
    Wang, Qian
    Wang, Mou
    Yang, Yan
    Zhang, Xiaolei
    COMPUTERS IN BIOLOGY AND MEDICINE, 2022, 149
  • [5] Multi-Modal Emotion Recognition by Fusing Correlation Features of Speech-Visual
    Chen Guanghui
    Zeng Xiaoping
    IEEE SIGNAL PROCESSING LETTERS, 2021, 28 : 533 - 537
  • [6] Multi-modal Emotion Recognition Based on Speech and Image
    Li, Yongqiang
    He, Qi
    Zhao, Yongping
    Yao, Hongxun
    ADVANCES IN MULTIMEDIA INFORMATION PROCESSING - PCM 2017, PT I, 2018, 10735 : 844 - 853
  • [7] Multi-modal Correlated Network for emotion recognition in speech
    Ren, Minjie
    Nie, Weizhi
    Liu, Anan
    Su, Yuting
    VISUAL INFORMATICS, 2019, 3 (03) : 150 - 155
  • [8] Contextual and Cross-Modal Interaction for Multi-Modal Speech Emotion Recognition
    Yang, Dingkang
    Huang, Shuai
    Liu, Yang
    Zhang, Lihua
    IEEE SIGNAL PROCESSING LETTERS, 2022, 29 : 2093 - 2097
  • [9] Audio-Visual Emotion Recognition System Using Multi-Modal Features
    Handa, Anand
    Agarwal, Rashi
    Kohli, Narendra
    INTERNATIONAL JOURNAL OF COGNITIVE INFORMATICS AND NATURAL INTELLIGENCE, 2021, 15 (04)
  • [10] Facial emotion recognition using multi-modal information
    De Silva, LC
    Miyasato, T
    Nakatsu, R
    ICICS - PROCEEDINGS OF 1997 INTERNATIONAL CONFERENCE ON INFORMATION, COMMUNICATIONS AND SIGNAL PROCESSING, VOLS 1-3: THEME: TRENDS IN INFORMATION SYSTEMS ENGINEERING AND WIRELESS MULTIMEDIA COMMUNICATIONS, 1997, : 397 - 401