Multi-modal emotion recognition in conversation based on prompt learning with text-audio fusion features

被引:0
作者
Wu, Yuezhou [1 ]
Zhang, Siling [1 ]
Li, Pengfei [1 ]
机构
[1] Civil Aviat Flight Univ China, Sch Comp Sci, Guanghan 618307, Peoples R China
来源
SCIENTIFIC REPORTS | 2025年 / 15卷 / 01期
关键词
D O I
10.1038/s41598-025-89758-8
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
With the widespread adoption of interactive machine applications, Emotion Recognition in Conversations (ERC) technology has garnered increasing attention. Although existing methods have improved recognition accuracy by integrating structured data, language barriers and the scarcity of non-English resources limit their cross-lingual applications. In light of this, the MERC-PLTAF method proposed in this paper innovatively focuses on multimodal emotion recognition in conversations, aiming to overcome the limitations of single modality and language barriers through refined feature extraction and a sophisticated cross-fusion strategy. We conducted extensive validation on multiple English and Chinese datasets, and the experimental results demonstrate that this method not only significantly improves emotion recognition accuracy but also exhibits exceptional performance on the Chinese M3ED dataset, paving a new path for cross-lingual emotion recognition. This research not only advances the boundaries of emotion recognition technology but also lays a solid theoretical foundation and practical framework for creating more intelligent and human-centric interactive experiences.
引用
收藏
页数:15
相关论文
共 39 条
  • [1] Baevski Alexei, PMLR
  • [2] Bai SJ, 2018, Arxiv, DOI [arXiv:1803.01271, DOI 10.48550/ARXIV.1803.01271]
  • [3] IEMOCAP: interactive emotional dyadic motion capture database
    Busso, Carlos
    Bulut, Murtaza
    Lee, Chi-Chun
    Kazemzadeh, Abe
    Mower, Emily
    Kim, Samuel
    Chang, Jeannette N.
    Lee, Sungbok
    Narayanan, Shrikanth S.
    [J]. LANGUAGE RESOURCES AND EVALUATION, 2008, 42 (04) : 335 - 359
  • [4] Understanding Emotions in Text Using Deep Learning and Big Data
    Chatterjee, Ankush
    Gupta, Umang
    Chinnakotla, Manoj Kumar
    Srikanth, Radhakrishnan
    Galley, Michel
    Agrawal, Puneet
    [J]. COMPUTERS IN HUMAN BEHAVIOR, 2019, 93 : 309 - 317
  • [5] WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing
    Chen, Sanyuan
    Wang, Chengyi
    Chen, Zhengyang
    Wu, Yu
    Liu, Shujie
    Chen, Zhuo
    Li, Jinyu
    Kanda, Naoyuki
    Yoshioka, Takuya
    Xiao, Xiong
    Wu, Jian
    Zhou, Long
    Ren, Shuo
    Qian, Yanmin
    Qian, Yao
    Zeng, Michael
    Yu, Xiangzhan
    Wei, Furu
    [J]. IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2022, 16 (06) : 1505 - 1518
  • [6] A Review and Meta-Analysis of Multimodal Affect Detection Systems
    D'Mello, Sidney K.
    Kory, Jacqueline
    [J]. ACM COMPUTING SURVEYS, 2015, 47 (03)
  • [7] Gadzama WA., 2024, Personal. Med. Psychiatry, V45, DOI [10.1016/j.pmip.2024.100125, DOI 10.1016/J.PMIP.2024.100125]
  • [8] Ghosal D., 2019, arXiv
  • [9] Ghosal D, 2020, Arxiv, DOI arXiv:2010.02795
  • [10] Hazarika D, 2018, 2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018), P2594