Malicious web content detection by machine learning

被引:62
作者
Hou, Yung-Tsung [1 ]
Chang, Yimeng [2 ]
Chen, Tsuhan [2 ]
Laih, Chi-Sung [3 ]
Chen, Chia-Mei [1 ]
机构
[1] Natl Sun Yat Sen Univ, Kaohsiung 80424, Taiwan
[2] Carnegie Mellon Univ, Pittsburgh, PA 15213 USA
[3] Natl Cheng Kung Univ, Tainan 70101, Taiwan
关键词
Dynamic [!text type='HTML']HTML[!/text; Malicious webpage; Machine learning;
D O I
10.1016/j.eswa.2009.05.023
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The recent development of the dynamic HTML gives attackers a new and powerful technique to compromise computer systems. A Malicious dynamic HTML code is usually embedded in a normal webpage. The malicious webpage infects the victim when a user browses it. Furthermore such DHTML code can disguise itself easily through obfuscation or transformation, which makes the detection even harder. Anti-virus software packages commonly use signature-based approaches which might not be able to efficiently identify camouflaged malicious HTML codes. Therefore, our paper proposes a malicious web page detection using the technique of machine learning. Our study analyzes the characteristic of a malicious webpage systematically and presents important features for machine learning. Experimental results demonstrate that our method is resilient to code obfuscations and can correctly determine whether a webpage is malicious or not. (C) 2009 Elsevier Ltd. All rights reserved.
引用
收藏
页码:55 / 60
页数:6
相关论文
共 14 条
[1]  
Bergeron J., 2001, P S REQ ENG INF SEC
[2]  
Boser B. E., 1992, Proceedings of the Fifth Annual ACM Workshop on Computational Learning Theory, P144, DOI 10.1145/130385.130401
[3]   Semantics-aware malware detection [J].
Christodorescu, M ;
Jha, S ;
Seshia, SA ;
Song, D ;
Bryant, RE .
2005 IEEE SYMPOSIUM ON SECURITY AND PRIVACY, PROCEEDINGS, 2005, :32-46
[4]  
CHRISTODORESCU M, 2004, P ACM SIGSOFT INT S, P34
[5]  
Freund Y., 1996, INT C MACH LEARN ICM, V6, P148, DOI DOI 10.5555/3091696.3091715
[6]  
Kinder J, 2005, LECT NOTES COMPUT SC, V3548, P174
[7]  
Kolter J.Z., 2004, Proceedings of Knowledge Discovery and Data mining, P470
[8]   ON RELEVANCE, PROBABILISTIC INDEXING AND INFORMATION RETRIEVAL [J].
MARON, ME ;
KUHNS, JL .
JOURNAL OF THE ACM, 1960, 7 (03) :216-244
[9]   Attacking malicious code: A report to the Infoses Research Council [J].
McGraw, G ;
Morrisett, G .
IEEE SOFTWARE, 2000, 17 (05) :33-+
[10]  
Moshchuk A., 2006, P NETWORK DISTRIBUTE, P17