Modeling inter-modal incongruous sentiment expressions for multi-modal sarcasm detection

被引:3
|
作者
Ou, Lisong [1 ,2 ,3 ]
Li, Zhixin [1 ,2 ]
机构
[1] Guangxi Normal Univ, Key Lab Educ Blockchain & Intelligent Technol, Minist Educ, Guilin 541004, Peoples R China
[2] Guangxi Normal Univ, Guangxi Key Lab Multisource Informat Min & Secur, Guilin 541004, Peoples R China
[3] Guilin Univ Technol, Sch Math & Stat, Guilin 541004, Peoples R China
基金
中国国家自然科学基金;
关键词
Multi-modal sarcasm detection; Graph convolutional network; Cross-modal mapping; External knowledge; Cross-correlation graphs;
D O I
10.1016/j.neucom.2024.128874
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Multi-modal sarcasm detection (MSD) presents a formidable and intricate endeavor. Despite strides made by extant models, two principal hurdles persist: Firstly, prevailing methodologies merely address superficial disparities between textual inputs and associated images, neglecting nuanced inter-modal combinations. Secondly, satirical instances frequently involve intricate emotional expressions, highlighting the imperative of leveraging emotional cues across modalities to discern satirical nuances. Accordingly, this research proposes the utilization of a deep graph convolutional network that integrates cross-modal mapping information to effectively identify significant incongruent sentiment expressions across various modalities for the purpose of multi-modal sarcasm detection. Specifically, we first design a cross-modal mapping network, which obtains the interaction information between these two modalities by mapping text feature vectors and image feature vectors two by two to compensate for the lack of multi-modal data in the fusion process. Additionally, we employ external knowledge of ANPS as abridge to construct cross-correlation graphs through highly correlated satirical cues and their connection weights between image and text modalities. Afterward, the GCN architecture with retrieval-based attentional mechanisms will effectively capture satirical cues. The experiments conducted on two publicly available datasets demonstrate a significant enhancement in the performance of our method when compared to numerous contemporary models.
引用
收藏
页数:11
相关论文
共 29 条
  • [1] Multi-Modal Sarcasm Detection with Interactive In-Modal and Cross-Modal Graphs
    Liang, Bin
    Lou, Chenwei
    Li, Xiang
    Gui, Lin
    Yang, Min
    Xu, Ruifeng
    PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 4707 - 4715
  • [2] Modeling Multi-Task Joint Training of Aggregate Networks for Multi-Modal Sarcasm Detection
    Ou, Lisong
    Li, Zhixin
    PROCEEDINGS OF THE 4TH ANNUAL ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, ICMR 2024, 2024, : 833 - 841
  • [3] Multi-Modal Sarcasm Detection Based on Cross-Modal Composition of Inscribed Entity Relations
    Li, Lingshan
    Jin, Di
    Wang, Xiaobao
    Guo, Fengyu
    Wang, Longbiao
    Dang, Jianwu
    2023 IEEE 35TH INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE, ICTAI, 2023, : 918 - 925
  • [4] Multi-modal sarcasm detection based on Multi-Channel Enhanced Fusion model
    Fang, Hong
    Liang, Dahao
    Xiang, Weiyu
    NEUROCOMPUTING, 2024, 578
  • [5] Single-Stage Extensive Semantic Fusion for multi-modal sarcasm detection
    Fang, Hong
    Liang, Dahao
    Xiang, Weiyu
    ARRAY, 2024, 22
  • [6] Multi-modal Sarcasm Detection on Social Media via Multi-Granularity Information Fusion
    Ou, Lisong
    Li, Zhixin
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2025, 21 (03)
  • [7] Inter-modal Fusion Network with Graph Structure Preserving for Fake News Detection
    Liu, Jing
    Wu, Fei
    Jin, Hao
    Zhu, Xiaoke
    Jing, Xiao-Yuan
    NEURAL INFORMATION PROCESSING, ICONIP 2023, PT VI, 2024, 14452 : 280 - 291
  • [8] Multi-level Interaction Network for Multi-Modal Rumor Detection
    Zou, Ting
    Qian, Zhong
    Li, Peifeng
    2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
  • [9] Contextual and Cross-Modal Interaction for Multi-Modal Speech Emotion Recognition
    Yang, Dingkang
    Huang, Shuai
    Liu, Yang
    Zhang, Lihua
    IEEE SIGNAL PROCESSING LETTERS, 2022, 29 : 2093 - 2097
  • [10] Graph Convolutional Incomplete Multi-modal Hashing
    Shen, Xiaobo
    Chen, Yinfan
    Pan, Shirui
    Liu, Weiwei
    Zheng, Yuhui
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 7029 - 7037