Audio-Textual Emotion Recognition Based on Improved Neural Networks

被引：18

作者：

Cai, Linqin ^{[1
]}

Hu, Yaxin ^{[1
]}

Dong, Jiangong ^{[1
]}

Zhou, Sitong ^{[1
]}

机构：

[1] Chongqing Univ Posts & Telecommun, Minist Educ, Key Lab Ind Internet Things & Networked Control, Chongqing, Peoples R China

来源：

MATHEMATICAL PROBLEMS IN ENGINEERING | 2019年 / 2019卷

基金：

国家重点研发计划;

关键词：

31;

D O I：

10.1155/2019/2593036

中图分类号：

T [工业技术];

学科分类号：

08 ;

摘要：

With the rapid development in social media, single-modal emotion recognition is hard to satisfy the demands of the current emotional recognition system. Aiming to optimize the performance of the emotional recognition system, a multimodal emotion recognition model from speech and text was proposed in this paper. Considering the complementarity between different modes, CNN (convolutional neural network) and LSTM (long short-term memory) were combined in a form of binary channels to learn acoustic emotion features; meanwhile, an effective Bi-LSTM (bidirectional long short-term memory) network was resorted to capture the textual features. Furthermore, we applied a deep neural network to learn and classify the fusion features. The final emotional state was determined by the output of both speech and text emotion analysis. Finally, the multimodal fusion experiments were carried out to validate the proposed model on the IEMOCAP database. In comparison with the single modal, the overall recognition accuracy of text increased 6.70%, and that of speech emotion recognition soared 13.85%. Experimental results show that the recognition accuracy of our multimodal is higher than that of the single modal and outperforms other published multimodal models on the test datasets.

引用

页数：9

共 50 条

[1] Leveraging Semantic Information for Efficient Self-Supervised Emotion Recognition with Audio-Textual Distilled Models
de Oliveira, Danilo
Prabhu, Navin Raj
Gerkmann, Timo
INTERSPEECH 2023, 2023, : 3632 - 3636
[2] Deep neural networks for emotion recognition combining audio and transcripts
Cho, Jaejin
Pappagari, Raghavendra
Kulkarni, Purva
Villalba, Jesus
Carmiel, Yishay
Dehak, Najim
19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 247 - 251
[3] Conversational Speech Recognition by Learning Audio-Textual Cross-Modal Contextual Representation
Wei, Kun
Li, Bei
Lv, Hang
Lu, Quan
Jiang, Ning
Xie, Lei
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 (2432-2444) : 2432 - 2444
[4] Audio Segmentation based Approach for Improved Emotion Recognition
Pandharipande, Meghna Abhishek
Kopparapu, Sunil Kumar
TENCON 2015 - 2015 IEEE REGION 10 CONFERENCE, 2015,
[5] Auxiliary audio-textual modalities for better action recognition on vision-specific annotated videos
Alfasly, Saghir
Lu, Jian
Xu, Chen
Li, Yu
Zou, Yuru
PATTERN RECOGNITION, 2024, 156
[6] Multi-view Neural Networks for Raw Audio-based Music Emotion Recognition
He, Na
Ferguson, Sam
2020 IEEE INTERNATIONAL SYMPOSIUM ON MULTIMEDIA (ISM 2020), 2020, : 168 - 172
[7] Cascaded cross-modal transformer for audio-textual classification
Ristea, Nicolae-Catalin
Anghel, Andrei
Ionescu, Radu Tudor
ARTIFICIAL INTELLIGENCE REVIEW, 2024, 57 (09)
[8] Audio-Textual Arabic Dialect Identification for Opinion Mining Videos
Al-Azani, Sadam
E-Alfyt, El-Sayed M.
2019 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (IEEE SSCI 2019), 2019, : 2470 - 2475
[9] Audio and Video Bimodal Emotion Recognition in Social Networks Based on Improved AlexNet Network and Attention Mechanism
Liu, Min
Tang, Jun
JOURNAL OF INFORMATION PROCESSING SYSTEMS, 2021, 17 (04): : 754 - 771
[10] A Deep Ensemble Approach of Anger Detection from Audio-Textual Conversations
Nahar, Mahjabin
Ali, Mohammed Eunus
2022 10TH INTERNATIONAL CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION (ACII), 2022,

← 1 2 3 4 5 →