Hierarchical Semantic Enhancement Network for Multimodal Fake News Detection

被引:7
作者
Zhang, Qiang [1 ]
Liu, Jiawei [1 ]
Zhang, Fanrui [1 ]
Xie, Jingyi [1 ]
Zha, Zheng-Jun [1 ]
机构
[1] Univ Sci & Technol China, Hefei, Anhui, Peoples R China
来源
PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023 | 2023年
基金
国家重点研发计划; 中国国家自然科学基金;
关键词
Fake news detection; Semantic information; Multimodal; Entity; ATTENTION;
D O I
10.1145/3581783.3612423
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The explosion of multimodal fake news content on social media has sparked widespread concern. Existing multimodal fake news detection methods have made significant contributions to the development of this field, but fail to adequately exploit the potential semantic information of images and ignore the noise embedded in news entities, which severely limits the performance of the models. In this paper, we propose a novel Hierarchical Semantic Enhancement Network (HSEN) for multimodal fake news detection by learning text-related image semantic and precise news high-order knowledge semantic information. Specifically, to complement the image semantic information, HSEN utilizes textual entities as the prompt subject vocabulary and applies reinforcement learning to discover the optimal prompt format for generating image captions specific to the corresponding textual entities, which contain multi-level cross-modal correlation information. Moreover, HSEN extracts visual and textual entities from image and text, and identifies additional visual entities from image captions to extend image semantic knowledge. Based on that, HSEN exploits an adaptive hard attention mechanism to automatically select strongly related news entities and remove irrelevant noise entities to obtain precise high-order knowledge semantic information, while generating attention mask for guiding cross-modal knowledge interaction. Extensive experiments show that our method outperforms state-of-the-art methods.
引用
收藏
页码:3424 / 3433
页数:10
相关论文
共 54 条
[1]   Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering [J].
Anderson, Peter ;
He, Xiaodong ;
Buehler, Chris ;
Teney, Damien ;
Johnson, Mark ;
Gould, Stephen ;
Zhang, Lei .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :6077-6086
[2]  
[Anonymous], 2022, INT C MACH LEARN
[3]  
[Anonymous], 2015, P 24 ACM INT C INF K
[4]  
[Anonymous], 2012, P 50 ANN M ASS COMPU
[5]  
Bollacker K, 2008, Proceedings of SIGMOD, SIGMOD '08, P1247
[6]  
Bordes A., 2013, P ADV NEUR INF PROC, P2787, DOI DOI 10.5555/2999792.2999923
[7]   Say As You Wish: Fine-grained Control of Image Caption Generation with Abstract Scene Graphs [J].
Chen, Shizhe ;
Jin, Qin ;
Wang, Peng ;
Wu, Qi .
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, :9959-9968
[8]   Cross-modal Ambiguity Learning for Multimodal Fake News Detection [J].
Chen, Yixuan ;
Li, Dongsheng ;
Zhang, Peng ;
Sui, Jie ;
Lv, Qin ;
Lu, Tun ;
Shang, Li .
PROCEEDINGS OF THE ACM WEB CONFERENCE 2022 (WWW'22), 2022, :2897-2905
[9]  
Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848
[10]  
Deng Mingkai, 2022, ARXIV220512548