LMBot: Distilling Graph Knowledge into Language Model for Graph-less Deployment in Twitter Bot Detection

被引:7
作者
Cai, Zijian [1 ]
Tan, Zhaoxuan [2 ]
Lei, Zhenyu [3 ]
Zhu, Zifeng [1 ]
Wang, Hongrui [1 ]
Zheng, Qinghua [1 ]
Luo, Minnan [1 ]
机构
[1] Xi An Jiao Tong Univ, Xian, Shaanxi, Peoples R China
[2] Univ Notre Dame, Notre Dame, IN 46556 USA
[3] Univ Virginia, Charlottesville, VA 22901 USA
来源
PROCEEDINGS OF THE 17TH ACM INTERNATIONAL CONFERENCE ON WEB SEARCH AND DATA MINING, WSDM 2024 | 2024年
基金
中国国家自然科学基金;
关键词
Twitter Bot Detection; Knowledge Distillation; Social Network Analysis;
D O I
10.1145/3616855.3635843
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
As malicious actors employ increasingly advanced and widespread bots to disseminate misinformation and manipulate public opinion, the detection of Twitter bots has become a crucial task. Though graph-based Twitter bot detection methods achieve state-of-the-art performance, we find that their inference depends on the neighbor users multi-hop away from the targets, and fetching neighbors is time-consuming and may introduce sampling bias. At the same time, our experiments reveal that after finetuning on Twitter bot detection task, pretrained language models achieve competitive performance while do not require a graph structure during deployment. Inspired by this finding, we propose a novel bot detection framework LMBot(1) that distills the graph knowledge into language models (LMs) for graph-less deployment in Twitter bot detection to combat data dependency challenge. Moreover, LMBot is compatible with graph-based and graph-less datasets. Specifically, we first represent each user as a textual sequence and feed them into the LM for domain adaptation. For graph-based datasets, the output of LM serves as input features for the GNN, enabling LMBot to optimize for bot detection and distill knowledge back to the LM in an iterative, mutually enhancing process. Armed with the LM, we can perform graph-less inference with graph knowledge, which resolves the graph data dependency and sampling bias issues. For datasets without graph structure, we simply replace the GNN with an MLP, which also shows strong performance. Our experiments demonstrate that LMBot achieves state-of-the-art performance on four Twitter bot detection benchmarks. Extensive studies also show that LMBot is more robust, versatile, and efficient compared to existing graph-based Twitter bot detection methods.
引用
收藏
页码:57 / 66
页数:10
相关论文
共 51 条
[1]   Detect Me If You Can: Spam Bot Detection Using Inductive Representation Learning [J].
Alhosseini, Seyed Ali ;
Bin Tareaf, Raad ;
Najafi, Pejman ;
Meinel, Christoph .
COMPANION OF THE WORLD WIDE WEB CONFERENCE (WWW 2019 ), 2019, :148-153
[2]  
Beskow David M, 2018, C PAPER SBP BRIMS IN, V3
[3]   The Paradigm-Shift of Social Spambots: Evidence, Theories, and Tools for the Arms Race [J].
Cresci, Stefano ;
Di Pietro, Roberto ;
Petrocchi, Marinella ;
Spognardi, Angelo ;
Tesconi, Maurizio .
WWW'17 COMPANION: PROCEEDINGS OF THE 26TH INTERNATIONAL CONFERENCE ON WORLD WIDE WEB, 2017, :963-972
[4]   Fame for sale: Efficient detection of fake Twitter followers [J].
Cresci, Stefano ;
Di Pietro, Roberto ;
Petrocchi, Marinella ;
Spognardi, Angelo ;
Tesconi, Maurizio .
DECISION SUPPORT SYSTEMS, 2015, 80 :56-71
[5]   EDITS: Modeling and Mitigating Data Bias for Graph Neural Networks [J].
Dong, Yushun ;
Liu, Ninghao ;
Jalaian, Brian ;
Li, Jundong .
PROCEEDINGS OF THE ACM WEB CONFERENCE 2022 (WWW'22), 2022, :1259-1269
[6]   LOBO - Evaluation of Generalization Deficiencies in Twitter Bot Classifiers [J].
Echeverria, Juan ;
De Cristofaro, Emiliano ;
Kourtellis, Nicolas ;
Leontiadis, Ilias ;
Stringhini, Gianluca ;
Zhou, Shi .
34TH ANNUAL COMPUTER SECURITY APPLICATIONS CONFERENCE (ACSAC 2018), 2018, :137-146
[7]   SATAR: A Self-supervised Approach to Twitter Account Representation Learning and its Application in Bot Detection [J].
Feng, Shangbin ;
Wan, Herun ;
Wang, Ningnan ;
Li, Jundong ;
Luo, Minnan .
PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT, CIKM 2021, 2021, :3808-3817
[8]  
Feng SB, 2022, Arxiv, DOI [arXiv:2206.04564, DOI 10.48550/ARXIV.2206.04564]
[9]   BotRGCN: Twitter Bot Detection with Relational Graph Convolutional Networks [J].
Feng, Shangbin ;
Wan, Herun ;
Wang, Ningnan ;
Luo, Minnan .
PROCEEDINGS OF THE 2021 IEEE/ACM INTERNATIONAL CONFERENCE ON ADVANCES IN SOCIAL NETWORKS ANALYSIS AND MINING, ASONAM 2021, 2021, :236-239
[10]   TwiBot-20: A Comprehensive Twitter Bot Detection Benchmark [J].
Feng, Shangbin ;
Wan, Herun ;
Wang, Ningnan ;
Li, Jundong ;
Luo, Minnan .
PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT, CIKM 2021, 2021, :4485-4494