Audio-Textual Emotion Recognition Based on Improved Neural Networks

被引:18
|
作者
Cai, Linqin [1 ]
Hu, Yaxin [1 ]
Dong, Jiangong [1 ]
Zhou, Sitong [1 ]
机构
[1] Chongqing Univ Posts & Telecommun, Minist Educ, Key Lab Ind Internet Things & Networked Control, Chongqing, Peoples R China
基金
国家重点研发计划;
关键词
31;
D O I
10.1155/2019/2593036
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
With the rapid development in social media, single-modal emotion recognition is hard to satisfy the demands of the current emotional recognition system. Aiming to optimize the performance of the emotional recognition system, a multimodal emotion recognition model from speech and text was proposed in this paper. Considering the complementarity between different modes, CNN (convolutional neural network) and LSTM (long short-term memory) were combined in a form of binary channels to learn acoustic emotion features; meanwhile, an effective Bi-LSTM (bidirectional long short-term memory) network was resorted to capture the textual features. Furthermore, we applied a deep neural network to learn and classify the fusion features. The final emotional state was determined by the output of both speech and text emotion analysis. Finally, the multimodal fusion experiments were carried out to validate the proposed model on the IEMOCAP database. In comparison with the single modal, the overall recognition accuracy of text increased 6.70%, and that of speech emotion recognition soared 13.85%. Experimental results show that the recognition accuracy of our multimodal is higher than that of the single modal and outperforms other published multimodal models on the test datasets.
引用
收藏
页数:9
相关论文
共 50 条
  • [1] Leveraging Semantic Information for Efficient Self-Supervised Emotion Recognition with Audio-Textual Distilled Models
    de Oliveira, Danilo
    Prabhu, Navin Raj
    Gerkmann, Timo
    INTERSPEECH 2023, 2023, : 3632 - 3636
  • [2] Deep neural networks for emotion recognition combining audio and transcripts
    Cho, Jaejin
    Pappagari, Raghavendra
    Kulkarni, Purva
    Villalba, Jesus
    Carmiel, Yishay
    Dehak, Najim
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 247 - 251
  • [3] Conversational Speech Recognition by Learning Audio-Textual Cross-Modal Contextual Representation
    Wei, Kun
    Li, Bei
    Lv, Hang
    Lu, Quan
    Jiang, Ning
    Xie, Lei
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 (2432-2444) : 2432 - 2444
  • [4] Audio Segmentation based Approach for Improved Emotion Recognition
    Pandharipande, Meghna Abhishek
    Kopparapu, Sunil Kumar
    TENCON 2015 - 2015 IEEE REGION 10 CONFERENCE, 2015,
  • [5] Auxiliary audio-textual modalities for better action recognition on vision-specific annotated videos
    Alfasly, Saghir
    Lu, Jian
    Xu, Chen
    Li, Yu
    Zou, Yuru
    PATTERN RECOGNITION, 2024, 156
  • [6] Multi-view Neural Networks for Raw Audio-based Music Emotion Recognition
    He, Na
    Ferguson, Sam
    2020 IEEE INTERNATIONAL SYMPOSIUM ON MULTIMEDIA (ISM 2020), 2020, : 168 - 172
  • [7] Cascaded cross-modal transformer for audio-textual classification
    Ristea, Nicolae-Catalin
    Anghel, Andrei
    Ionescu, Radu Tudor
    ARTIFICIAL INTELLIGENCE REVIEW, 2024, 57 (09)
  • [8] Audio-Textual Arabic Dialect Identification for Opinion Mining Videos
    Al-Azani, Sadam
    E-Alfyt, El-Sayed M.
    2019 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (IEEE SSCI 2019), 2019, : 2470 - 2475
  • [9] Audio and Video Bimodal Emotion Recognition in Social Networks Based on Improved AlexNet Network and Attention Mechanism
    Liu, Min
    Tang, Jun
    JOURNAL OF INFORMATION PROCESSING SYSTEMS, 2021, 17 (04): : 754 - 771
  • [10] A Deep Ensemble Approach of Anger Detection from Audio-Textual Conversations
    Nahar, Mahjabin
    Ali, Mohammed Eunus
    2022 10TH INTERNATIONAL CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION (ACII), 2022,