Speak From Heart: An Emotion-Guided LLM-Based Multimodal Method for Emotional Dialogue Generation

被引:1
作者
Liu, Chenxiao [1 ]
Xie, Zheyong [1 ]
Zhao, Sirui [1 ]
Zhou, Jin [1 ]
Xu, Tong [1 ]
Li, Minglei [2 ]
Chen, Enhong [1 ]
机构
[1] Univ Sci & Technol China, Hefei, Peoples R China
[2] Huawei Cloud, Shenzhen, Peoples R China
来源
PROCEEDINGS OF THE 4TH ANNUAL ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, ICMR 2024 | 2024年
基金
中国国家自然科学基金;
关键词
Large Language Models; Emotional expression; Multimodal cues; Emotional retrieval module; Dialogue systems;
D O I
10.1145/3652583.3658104
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recent advancements in Large Language Models (LLMs) have greatly enhanced the generation capabilities of dialogue systems. However, progress on emotional expression during dialogues might be still limited, especially when capturing and processing the multimodal cues for emotional expression. Therefore, it is urgent to fully adapt the multimodal understanding ability and transferability of LLMs to enhance the emotional-oriented multimodal processing capabilities. To that end, in this paper, we propose a novel Emotion-Guided Multimodal Dialogue model based on LLM, termed ELMD. Specifically, to enhance the emotional expression ability of LLMs, our ELMD customizes an emotional retrieval module, which mainly provides appropriate response demonstration for LLM in understanding emotional context. Subsequently, a two-stage training strategy is proposed, founded on previous demonstration support, to support uncovering nuanced emotions behind multimodal information and constructing natural responses. Comprehensive experiments demonstrate the effectiveness and superiority of ELMD.
引用
收藏
页码:533 / 542
页数:10
相关论文
共 42 条
  • [1] Alayrac JB, 2022, ADV NEUR IN
  • [2] Banerjee S., 2005, P ACL WORKSH INTR EX, V29, P65, DOI DOI 10.3115/1626355.1626389
  • [3] TOMGPT: Reliable Text-Only Training Approach for Cost-Effective Multi-modal Large Language Model
    Chen, Yunkai
    Wang, Qimeng
    Wu, Shiwei
    Gao, Yan
    Xu, Tong
    Hu, Yao
    [J]. ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA, 2024, 18 (07)
  • [4] Du ZX, 2022, PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), P320
  • [5] EmoSen: Generating Sentiment and Emotion Controlled Responses in a Multimodal Dialogue System
    Firdaus, Mauajama
    Chauhan, Hardik
    Ekbal, Asif
    Bhattacharyya, Pushpak
    [J]. IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2022, 13 (03) : 1555 - 1566
  • [6] Gao TY, 2021, 59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (ACL-IJCNLP 2021), VOL 1, P3816
  • [7] Ghosal D, 2020, FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020, P2470
  • [8] Ghosal D, 2019, 2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019), P154
  • [9] He sicheng, 2023, M3e: Moka massive mixed embedding model
  • [10] Jiao WX, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P397