A Multitask learning model for multimodal sarcasm, sentiment and emotion recognition in conversations

被引:48
|
作者
Zhang, Yazhou [1 ]
Wang, Jinglin [2 ]
Liu, Yaochen [2 ]
Rong, Lu [1 ]
Zheng, Qian [1 ]
Song, Dawei [2 ]
Tiwari, Prayag [3 ]
Qin, Jing [4 ]
机构
[1] Zhengzhou Univ Light Ind, Coll Software Engn, Zhengzhou 450002, Peoples R China
[2] Beijing Inst Technol, Sch Comp Sci & Technol, Beijing, Peoples R China
[3] Halmstad Univ, Sch Informat Technol, Halmstad, Sweden
[4] Hong Kong Polytech Univ, Ctr Smart Hlth, Sch Nursing, Hongkong, Peoples R China
基金
美国国家科学基金会;
关键词
Multimodal sarcasm recognition; Sentiment analysis; Emotion recognition; Multitask learning; Affective computing; INTERACTION DYNAMICS;
D O I
10.1016/j.inffus.2023.01.005
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Sarcasm, sentiment and emotion are tightly coupled with each other in that one helps the understanding of another, which makes the joint recognition of sarcasm, sentiment and emotion in conversation a focus in the research in artificial intelligence (AI) and affective computing. Three main challenges exist: Context dependency, multimodal fusion and multitask interaction. However, most of the existing works fail to explicitly leverage and model the relationships among related tasks. In this paper, we aim to generically address the three problems with a multimodal joint framework. We thus propose a multimodal multitask learning model based on the encoder-decoder architecture, termed M2Seq2Seq. At the heart of the encoder module are two attention mechanisms, i.e., intramodal (Ia) attention and intermodal (Ie) attention. Ia attention is designed to capture the contextual dependency between adjacent utterances, while Ie attention is designed to model multimodal interactions. In contrast, we design two kinds of multitask learning (MTL) decoders, i.e., single -level and multilevel decoders, to explore their potential. More specifically, the core of a single-level decoder is a masked outer-modal (Or) self-attention mechanism. The main motivation of Or attention is to explicitly model the interdependence among the tasks of sarcasm, sentiment and emotion recognition. The core of the multilevel decoder contains the shared gating and task-specific gating networks. Comprehensive experiments on four bench datasets, MUStARD, Memotion, CMU-MOSEI and MELD, prove the effectiveness of M2Seq2Seq over state-of-the-art baselines (e.g., CM-GCN, A-MTL) with significant improvements of 1.9%, 2.0%, 5.0%, 0.8%, 4.3%, 3.1%, 2.8%, 1.0%, 1.7% and 2.8% in terms of Micro F1.
引用
收藏
页码:282 / 301
页数:20
相关论文
共 50 条
  • [21] Dynamic Weighted Multitask Learning and Contrastive Learning for Multimodal Sentiment Analysis
    Wang, Xingqi
    Zhang, Mengrui
    Chen, Bin
    Wei, Dan
    Shao, Yanli
    ELECTRONICS, 2023, 12 (13)
  • [22] Multimodal Multitask Emotion Recognition using Images, Texts and Tags
    Fortin, Mathieu Page
    Chaib-draa, Brahim
    PROCEEDINGS OF THE ACM WORKSHOP ON CROSSMODAL LEARNING AND APPLICATION (WCRML'19), 2019, : 3 - 10
  • [23] Learning Modality Consistency and Difference Information with Multitask Learning for Multimodal Sentiment Analysis
    Fang, Cheng
    Liang, Feifei
    Li, Tianchi
    Guan, Fangheng
    FUTURE INTERNET, 2024, 16 (06)
  • [24] Sarcasm driven by sentiment: A sentiment-aware hierarchical fusion network for multimodal sarcasm detection
    Liu, Hao
    Wei, Runguo
    Tu, Geng
    Lin, Jiali
    Liu, Cheng
    Jiang, Dazhi
    INFORMATION FUSION, 2024, 108
  • [25] AN END-TO-END MULTITASK LEARNING MODEL TO IMPROVE SPEECH EMOTION RECOGNITION
    Fu, Changzeng
    Liu, Chaoran
    Ishi, Carlos Toshinori
    Ishiguro, Hiroshi
    28TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2020), 2021, : 351 - 355
  • [26] A Transformer-Based Model With Self-Distillation for Multimodal Emotion Recognition in Conversations
    Ma, Hui
    Wang, Jian
    Lin, Hongfei
    Zhang, Bo
    Zhang, Yijia
    Xu, Bo
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 776 - 788
  • [27] Multimodal Emotion Recognition based on Global Information Fusion in Conversations
    Kim, Dae Hyeon
    Choi, Young-Seok
    2024 INTERNATIONAL TECHNICAL CONFERENCE ON CIRCUITS/SYSTEMS, COMPUTERS, AND COMMUNICATIONS, ITC-CSCC 2024, 2024,
  • [28] Convolutional MKL Based Multimodal Emotion Recognition and Sentiment Analysis
    Poria, Soujanya
    Chaturvedi, Iti
    Cambria, Erik
    Hussain, Amir
    2016 IEEE 16TH INTERNATIONAL CONFERENCE ON DATA MINING (ICDM), 2016, : 439 - 448
  • [29] An emoji-aware multitask framework for multimodal sarcasm detection
    Chauhan, Dushyant Singh
    Singh, Gopendra Vikram
    Arora, Aseem
    Ekbal, Asif
    Bhattacharyya, Pushpak
    KNOWLEDGE-BASED SYSTEMS, 2022, 257
  • [30] Uncertainty-Based Learning of a Lightweight Model for Multimodal Emotion Recognition
    Radoi, Anamaria
    Cioroiu, George
    IEEE ACCESS, 2024, 12 : 120362 - 120374