Attention-based multi-modal fusion sarcasm detection

被引:1
作者
Liu, Jing [1 ]
Tian, Shengwei [1 ]
Yu, Long [2 ]
Long, Jun [3 ,4 ]
Zhou, Tiejun [5 ]
Wang, Bo [1 ]
机构
[1] Xinjiang Univ, Sch Software, Urumqi, Xinjiang, Peoples R China
[2] Xinjiang Univ, Network & Informat Ctr, Urumqi, Xinjiang, Peoples R China
[3] Cent South Univ, Sch Informat Sci & Engn, Changsha, Peoples R China
[4] Cent South Univ, Big Data & Knowledge Engn Inst, Changsha, Peoples R China
[5] Xinjiang Internet Informat Ctr, Urumqi, Xinjiang, Peoples R China
基金
中国国家自然科学基金;
关键词
Multi-modal; sarcasm detection; Attention; ViT; D-BiGRU;
D O I
10.3233/JIFS-213501
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Sarcasm is a way to express the thoughts of a person. The intended meaning of the ideas expressed through sarcasm is often the opposite of the apparent meaning. Previous work on sarcasm detection mainly focused on the text. But nowadays most information is multi-modal, including text and images. Therefore, the task of targeting multi-modal sarcasm detection is becoming an increasingly hot research topic. In order to better detect the accurate meaning of multi-modal sarcasm information, this paper proposed a multi-modal fusion sarcasm detection model based on the attention mechanism, which introduced Vision Transformer (ViT) to extract image features and designed a Double-Layer Bi-Directional Gated Recurrent Unit (D-BiGRU) to extract text features. The features of the two modalities are fused into one feature vector and predicted after attention enhancement. The model presented in this paper gained significant experimental results on the baseline datasets, which are 0.71% and 0.38% higher than that of the best baseline model proposed on F1-score and accuracy respectively.
引用
收藏
页码:2097 / 2108
页数:12
相关论文
共 30 条
  • [1] Amir S, 2016, Arxiv, DOI arXiv:1607.00976
  • [2] Bedi M., 2021, IEEE T AFFECTIVE COM
  • [3] Cai YT, 2019, 57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), P2506
  • [4] Cao L., 2016, P 24 ACM INT C MULT, P1136, DOI [DOI 10.1145/2964284.2964321, 10.1145/2964284.2964321]
  • [5] Dey R, 2017, MIDWEST SYMP CIRCUIT, P1597, DOI 10.1109/MWSCAS.2017.8053243
  • [6] Diktekoppa Thimmappa D., 2019, THESIS NATL COLL IRE
  • [7] Dosovitskiy A, 2021, Arxiv, DOI arXiv:2010.11929
  • [8] Dubey A, 2019, P 10 WORKSH COMP APP, P72, DOI DOI 10.18653/V1
  • [9] Garg Ashima, 2020, ICTACT Journal on Soft Computing, V10, P2165
  • [10] Ghosal D, 2018, 2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018), P3454