Uncovering Social Spammers: Social Honeypots plus Machine Learning

被引:284
作者
Lee, Kyumin [1 ]
Caverlee, James [1 ]
Webb, Steve
机构
[1] Texas A&M Univ, Dept Comp Sci & Engn, College Stn, TX 77843 USA
来源
SIGIR 2010: PROCEEDINGS OF THE 33RD ANNUAL INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH DEVELOPMENT IN INFORMATION RETRIEVAL | 2010年
关键词
social media; social honeypots; spam;
D O I
10.1145/1835449.1835522
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Web-based social systems enable new community-based opportunities for participants to engage, share, and interact. This community value and related services like search and advertising are threatened by spammers, content polluters, and malware disseminators. In an effort to preserve community value and ensure long-term success, we propose and evaluate a honeypot-based approach for uncovering social spammers in online social systems. Two of the key components of the proposed approach are: (1) The deployment of social honeypots for harvesting deceptive spam profiles from social networking communities; and (2) Statistical analysis of the properties of these spam profiles for creating spam classifiers to actively filter out existing and new spammers. We describe the conceptual framework and design considerations of the proposed approach, and we present concrete observations from the deployment of social honeypots in MySpace and Twitter. We find that the deployed social honeypots identify social spammers with low false positive rates and that the harvested spam data contains signals that are strongly correlated with observable profile features (e.g., content, friend information, posting patterns, etc.). Based on these, profile features, we develop machine learning based classifiers for I identifying previously unknown spammers with high precision and a low rate of false positives.
引用
收藏
页码:435 / 442
页数:8
相关论文
共 26 条
[1]  
Adamic L. A., 2008, WWW
[2]  
[Anonymous], ICWSM
[3]  
[Anonymous], WWW
[4]  
[Anonymous], 2008, WORKSH WEB 2 0 SEC P
[5]  
Becchetti L., 2006, SIGIR WORKSH ADV INF
[6]  
Benczur A. A., 2006, SIGIR WORKSH ADV INF
[7]  
Benevenuto F., 2009, SIGIR
[8]  
Boyd D., 2006, HICSS
[9]  
Bratko A, 2006, J MACH LEARN RES, V7, P2673
[10]  
Brown Garrett., 2008, In CSCW