Real-time Classification of Malicious URLs on Twitter using Machine Activity Data

被引:10
作者
Burnap, Pete [1 ]
Javed, Amir [1 ]
Rana, Omer F. [1 ]
Awan, Malik S. [1 ]
机构
[1] Cardiff Univ, Sch Comp Sci & Informat, Cardiff CF10 3AX, S Glam, Wales
来源
PROCEEDINGS OF THE 2015 IEEE/ACM INTERNATIONAL CONFERENCE ON ADVANCES IN SOCIAL NETWORKS ANALYSIS AND MINING (ASONAM 2015) | 2015年
基金
英国工程与自然科学研究理事会;
关键词
D O I
10.1145/2808797.2809281
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Massive online social networks with hundreds of millions of active users are increasingly being used by Cyber criminals to spread malicious software (malware) to exploit vulnerabilities on the machines of users for personal gain. Twitter is particularly susceptible to such activity as, with its 140 character limit, it is common for people to include URLs in their tweets to link to more detailed information, evidence, news reports and so on. URLs are often shortened so the endpoint is not obvious before a person clicks the link. Cyber criminals can exploit this to propagate malicious URLs on Twitter, for which the endpoint is a malicious server that performs unwanted actions on the person's machine. This is known as a drive-by-download. In this paper we develop a machine classification system to distinguish between malicious and benign URLs within seconds of the URL being clicked (i.e. 'real-time'). We train the classifier using machine activity logs created while interacting with URLs extracted from Twitter data collected during a large global event - the Superbowl - and test it using data from another large sporting event - the Cricket World Cup. The results show that machine activity logs produce precision performances of up to 0.975 on training data from the first event and 0.747 on a test data from a second event. Furthermore, we examine the properties of the learned model to explain the relationship between machine activity and malicious software behaviour, and build a learning curve for the classifier to illustrate that very small samples of training data can be used with only a small detriment to performance.
引用
收藏
页码:970 / 977
页数:8
相关论文
共 30 条
[1]  
Alosefer Yaser, 2010, Proceedings of the IEEE Third International Conference on Software Testing Verification and Validation - Workshops (ICSTW 2010), P410, DOI 10.1109/ICSTW.2010.41
[2]  
[Anonymous], 2011, P 20 INT C WORLD WID
[3]  
[Anonymous], 2012, NDSS
[4]  
[Anonymous], 2010, First Monday, DOI [DOI 10.5210/FM.V15I1.2793, 10.5210/fm.v15i1.2793]
[5]   Serglycin-deficient cytotoxic T lymphocytes display defective secretory granule maturation and granzyme B storage [J].
Grujic, M ;
Braga, T ;
Lukinius, A ;
Eloranta, ML ;
Knight, SD ;
Pejler, G ;
Åbrink, M .
JOURNAL OF BIOLOGICAL CHEMISTRY, 2005, 280 (39) :33411-33418
[6]   Tweeting the terror: modelling the social media reaction to the Woolwich terrorist attack [J].
Burnap, Pete ;
Williams, Matthew L. ;
Sloan, Luke ;
Rana, Omer ;
Housley, William ;
Edwards, Adam ;
Knight, Vincent ;
Procter, Rob ;
Voss, Alex .
SOCIAL NETWORK ANALYSIS AND MINING, 2014, 4 (01) :1-14
[7]   Making sense of self-reported socially significant data using computational methods [J].
Burnap, Peter ;
Avis, Nick J. ;
Rana, Omer F. .
INTERNATIONAL JOURNAL OF SOCIAL RESEARCH METHODOLOGY, 2013, 16 (03) :215-230
[8]  
Cova M, 2010, P 19 INT C WORLD WID, P281, DOI DOI 10.1145/1772690.1772720
[9]   Malware Propagation in Online Social Networks [J].
Faghani, Mohammad Reza ;
Saidi, Hossein .
2009 4TH INTERNATIONAL CONFERENCE ON MALICIOUS AND UNWANTED SOFTWARE (MALWARE 2009), 2009, :8-+
[10]  
Kapravelos Alexandros., 2013, P 22 USENIX C SECURI, P637