Multi-Modal Emotion Recognition Using Speech Features and Text-Embedding

被引:7
|
作者
Byun, Sung-Woo [1 ]
Kim, Ju-Hee [1 ]
Lee, Seok-Pil [2 ]
机构
[1] SangMyung Univ, Grad Sch, Dept Comp Sci, Seoul 03016, South Korea
[2] SangMyung Univ, Dept Elect Engn, Seoul 03016, South Korea
来源
APPLIED SCIENCES-BASEL | 2021年 / 11卷 / 17期
关键词
speech emotion recognition; emotion recognition; multi-modal emotion recognition;
D O I
10.3390/app11177967
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Recently, intelligent personal assistants, chat-bots and AI speakers are being utilized more broadly as communication interfaces and the demands for more natural interaction measures have increased as well. Humans can express emotions in various ways, such as using voice tones or facial expressions; therefore, multimodal approaches to recognize human emotions have been studied. In this paper, we propose an emotion recognition method to deliver more accuracy by using speech and text data. The strengths of the data are also utilized in this method. We conducted 43 feature vectors such as spectral features, harmonic features and MFCC from speech datasets. In addition, 256 embedding vectors from transcripts using pre-trained Tacotron encoder were extracted. The acoustic feature vectors and embedding vectors were fed into each deep learning model which produced a probability for the predicted output classes. The results show that the proposed model exhibited more accurate performance than in previous research.
引用
收藏
页数:9
相关论文
共 50 条
  • [21] Semantic Alignment Network for Multi-Modal Emotion Recognition
    Hou, Mixiao
    Zhang, Zheng
    Liu, Chang
    Lu, Guangming
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (09) : 5318 - 5329
  • [22] Emotion recognition with multi-modal peripheral physiological signals
    Gohumpu, Jennifer
    Xue, Mengru
    Bao, Yanchi
    FRONTIERS IN COMPUTER SCIENCE, 2023, 5
  • [23] A Multi-Modal Deep Learning Approach for Emotion Recognition
    Shahzad, H. M.
    Bhatti, Sohail Masood
    Jaffar, Arfan
    Rashid, Muhammad
    INTELLIGENT AUTOMATION AND SOFT COMPUTING, 2023, 36 (02) : 1561 - 1570
  • [24] ATTENTION DRIVEN FUSION FOR MULTI-MODAL EMOTION RECOGNITION
    Priyasad, Darshana
    Fernando, Tharindu
    Denman, Simon
    Sridharan, Sridha
    Fookes, Clinton
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 3227 - 3231
  • [25] Multi-modal Emotion Recognition for Determining Employee Satisfaction
    Zaman, Farhan Uz
    Zaman, Maisha Tasnia
    Alam, Md Ashraful
    Alam, Md Golam Rabiul
    2021 IEEE ASIA-PACIFIC CONFERENCE ON COMPUTER SCIENCE AND DATA ENGINEERING (CSDE), 2021,
  • [26] Lightweight multi-modal emotion recognition model based on modal generation
    Liu, Peisong
    Che, Manqiang
    Luo, Jiangchuan
    2022 9TH INTERNATIONAL FORUM ON ELECTRICAL ENGINEERING AND AUTOMATION, IFEEA, 2022, : 430 - 435
  • [27] Evaluating Ensemble Learning Methods for Multi-Modal Emotion Recognition Using Sensor Data Fusion
    Younis, Eman M. G.
    Zaki, Someya Mohsen
    Kanjo, Eiman
    Houssein, Essam H.
    SENSORS, 2022, 22 (15)
  • [28] Text-independent speech emotion recognition using frequency adaptive features
    Chenjian Wu
    Chengwei Huang
    Hong Chen
    Multimedia Tools and Applications, 2018, 77 : 24353 - 24363
  • [29] Text-independent speech emotion recognition using frequency adaptive features
    Wu, Chenjian
    Huang, Chengwei
    Chen, Hong
    MULTIMEDIA TOOLS AND APPLICATIONS, 2018, 77 (18) : 24353 - 24363
  • [30] Multi-modal speech emotion detection using optimised deep neural network classifier
    Padman, Sweta Nishant
    Magare, Dhiraj
    COMPUTER METHODS IN BIOMECHANICS AND BIOMEDICAL ENGINEERING-IMAGING AND VISUALIZATION, 2023, 11 (05) : 2020 - 2038