Statistical Detection of Online Drifting Twitter Spam [Invited Paper]

被引:43
作者
Liu, Shigang [1 ]
Zhang, Jun [1 ]
Xiang, Yang [1 ]
机构
[1] Deakin Univ, Sch Informat Technol, 221 Burwood Hwy, Burwood, Vic 3125, Australia
来源
ASIA CCS'16: PROCEEDINGS OF THE 11TH ACM ASIA CONFERENCE ON COMPUTER AND COMMUNICATIONS SECURITY | 2016年
关键词
Twitter spam detection; social network security; security data analytics;
D O I
10.1145/2897845.2897928
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Spam has become a critical problem in online social networks. This paper focuses on Twitter spam detection. Recent research works focus on applying machine learning techniques for Twitter spam detection, which make use of the statistical features of tweets. We observe existing machine learning based detection methods suffer from the problem of Twitter spam drift, i.e., the statistical properties of spam tweets vary over time. To avoid this problem, an effective solution is to train one twitter spam classifier every day. However, it faces a challenge of the small number of im-balanced training data because labelling spam samples is time-consuming. This paper proposes a new method to address this challenge. The new method employs two new techniques, fuzzy-based redistribution and asymmetric sampling. We develop a fuzzy-based information decomposition technique to re-distribute the spam class and generate more spam samples. Moreover, an asymmetric sampling technique is proposed to re-balance the sizes of spam samples and non-spam samples in the training data. Finally, we apply the ensemble technique to combine the spam classifiers over two different training sets. A number of experiments are performed on a real-world 10-day ground-truth dataset to evaluate the new method. Experiments results show that the new method can significantly improve the detection performance for drifting Twitter spam.
引用
收藏
页码:1 / 10
页数:10
相关论文
共 29 条
[1]  
Al Najada H, 2014, 2014 IEEE 15TH INTERNATIONAL CONFERENCE ON INFORMATION REUSE AND INTEGRATION (IRI), P553, DOI 10.1109/IRI.2014.7051938
[2]  
Alqatawna J., 2015, Int. J. Commun. Network Syst. Sci, V8, P118
[3]  
[Anonymous], 2012, P INT MULTICONFERENC
[4]  
[Anonymous], 2009, TWITTER STUDY
[5]  
[Anonymous], 2013, P S NETW DISTR SYST
[6]  
[Anonymous], 2010, First Monday, DOI [DOI 10.5210/FM.V15I1.2793, 10.5210/fm.v15i1.2793]
[7]  
Bekkar M., 2013, J. Inf. Eng. Appl., V3, P27
[8]  
Benevenuto Fabricio., 2010, CEAS
[9]  
Chen C, 2015, IEEE CONF COMPUT, P208, DOI 10.1109/INFCOMW.2015.7179386
[10]  
Chen C, 2015, IEEE ICC, P7065, DOI 10.1109/ICC.2015.7249453