Multi-Modal Emotion Recognition Using Speech Features and Text-Embedding

被引：7

作者：

Byun, Sung-Woo ^{[1
]}

Kim, Ju-Hee ^{[1
]}

Lee, Seok-Pil ^{[2
]}

机构：

[1] SangMyung Univ, Grad Sch, Dept Comp Sci, Seoul 03016, South Korea

[2] SangMyung Univ, Dept Elect Engn, Seoul 03016, South Korea

来源：

APPLIED SCIENCES-BASEL | 2021年 / 11卷 / 17期

关键词：

speech emotion recognition; emotion recognition; multi-modal emotion recognition;

D O I：

10.3390/app11177967

中图分类号：

O6 [化学];

学科分类号：

0703 ;

摘要：

Recently, intelligent personal assistants, chat-bots and AI speakers are being utilized more broadly as communication interfaces and the demands for more natural interaction measures have increased as well. Humans can express emotions in various ways, such as using voice tones or facial expressions; therefore, multimodal approaches to recognize human emotions have been studied. In this paper, we propose an emotion recognition method to deliver more accuracy by using speech and text data. The strengths of the data are also utilized in this method. We conducted 43 feature vectors such as spectral features, harmonic features and MFCC from speech datasets. In addition, 256 embedding vectors from transcripts using pre-trained Tacotron encoder were extracted. The acoustic feature vectors and embedding vectors were fed into each deep learning model which produced a probability for the predicted output classes. The results show that the proposed model exhibited more accurate performance than in previous research.

引用

页数：9

共 50 条

[1] Multi-modal Attention for Speech Emotion Recognition
Pan, Zexu
Luo, Zhaojie
Yang, Jichen
Li, Haizhou
INTERSPEECH 2020, 2020, : 364 - 368
[2] Multi-modal emotion recognition using EEG and speech signals
Wang, Qian
Wang, Mou
Yang, Yan
Zhang, Xiaolei
COMPUTERS IN BIOLOGY AND MEDICINE, 2022, 149
[3] Contextual and Cross-Modal Interaction for Multi-Modal Speech Emotion Recognition
Yang, Dingkang
Huang, Shuai
Liu, Yang
Zhang, Lihua
IEEE SIGNAL PROCESSING LETTERS, 2022, 29 : 2093 - 2097
[4] Multi-modal Emotion Recognition Based on Speech and Image
Li, Yongqiang
He, Qi
Zhao, Yongping
Yao, Hongxun
ADVANCES IN MULTIMEDIA INFORMATION PROCESSING - PCM 2017, PT I, 2018, 10735 : 844 - 853
[5] Multi-modal Correlated Network for emotion recognition in speech
Ren, Minjie
Nie, Weizhi
Liu, Anan
Su, Yuting
VISUAL INFORMATICS, 2019, 3 (03) : 150 - 155
[6] Multi-Modal Emotion Recognition From Speech and Facial Expression Based on Deep Learning
Cai, Linqin
Dong, Jiangong
Wei, Min
2020 CHINESE AUTOMATION CONGRESS (CAC 2020), 2020, : 5726 - 5729
[7] EXTRACTING AND RECOGNISING MUSIC FEATURES THROUGH MULTI-MODAL EMOTION RECOGNITION
Xu, Chi
MECHATRONIC SYSTEMS AND CONTROL, 2024, 52 (03): : 140 - 146
[8] SERVER: Multi-modal Speech Emotion Recognition using Transformer-based and Vision-based Embeddings
Nhat Truong Pham
Duc Ngoc Minh Dang
Bich Ngoc Hong Pham
Sy Dzung Nguyen
PROCEEDINGS OF 2023 8TH INTERNATIONAL CONFERENCE ON INTELLIGENT INFORMATION TECHNOLOGY, ICIIT 2023, 2023, : 234 - 238
[9] Multi-head attention fusion networks for multi-modal speech emotion recognition
Zhang, Junfeng
Xing, Lining
Tan, Zhen
Wang, Hongsen
Wang, Kesheng
COMPUTERS & INDUSTRIAL ENGINEERING, 2022, 168
[10] Multi-modal Emotion Recognition Based on Hypergraph
Zong L.-L.
Zhou J.-H.
Xie Q.-J.
Zhang X.-C.
Xu B.
Jisuanji Xuebao/Chinese Journal of Computers, 2023, 46 (12): : 2520 - 2534

← 1 2 3 4 5 →