Polymorphic Malware Detection Using Sequence Classification Methods

被引:40
作者
Drew, Jake [1 ]
Moore, Tyler [2 ]
Hahsler, Michael [1 ]
机构
[1] Southern Methodist Univ, Comp Sci & Engn Dept, Dallas, TX 75275 USA
[2] Univ Tulsa, Tandy Sch Comp Sci, Tulsa, OK 74104 USA
来源
2016 IEEE SYMPOSIUM ON SECURITY AND PRIVACY WORKSHOPS (SPW 2016) | 2016年
关键词
SEARCH;
D O I
10.1109/SPW.2016.30
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Polymorphic malware detection is challenging due to the continual mutations miscreants introduce to successive instances of a particular virus. Such changes are akin to mutations in biological sequences. Recently, high-throughput methods for gene sequence classification have been developed by the bioinformatics and computational biology communities. In this paper, we argue that these methods can be usefully applied to malware detection. Unfortunately, gene classification tools are usually optimized for and restricted to an alphabet of four letters ( nucleic acids). Consequently, we have selected the Strand gene sequence classifier, which offers a robust classification strategy that can easily accommodate unstructured data with any alphabet including source code or compiled machine code. To demonstrate Stand's suitability for classifying malware, we execute it on approximately 500GB of malware data provided by the Kaggle Microsoft Malware Classification Challenge (BIG 2015) used for predicting 9 classes of polymorphic malware. Experiments show that, with minimal adaptation, the method achieves accuracy levels well above 95% requiring only a fraction of the training times used by the winning team's method.
引用
收藏
页码:81 / 87
页数:7
相关论文
共 27 条
[1]   BASIC LOCAL ALIGNMENT SEARCH TOOL [J].
ALTSCHUL, SF ;
GISH, W ;
MILLER, W ;
MYERS, EW ;
LIPMAN, DJ .
JOURNAL OF MOLECULAR BIOLOGY, 1990, 215 (03) :403-410
[2]  
[Anonymous], 2015, Microsoft Malware Classification Challenge (Big 2015)
[3]  
[Anonymous], 2012, Mining of massive datasets
[4]  
Bailey M, 2007, LECT NOTES COMPUT SC, V4637, P178
[5]   Semantics-aware malware detection [J].
Christodorescu, M ;
Jha, S ;
Seshia, SA ;
Song, D ;
Bryant, RE .
2005 IEEE SYMPOSIUM ON SECURITY AND PRIVACY, PROCEEDINGS, 2005, :32-46
[6]  
Cohen F., 1987, Computers & Security, V6, P22, DOI 10.1016/0167-4048(87)90122-2
[7]  
Drew Jake., 2014, Proceedings of the 5th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics, P506
[8]  
Drew JM, 2014, MASS COMPROMISE IIS
[9]   Search and clustering orders of magnitude faster than BLAST [J].
Edgar, Robert C. .
BIOINFORMATICS, 2010, 26 (19) :2460-2461
[10]  
Gionis A, 1999, PROCEEDINGS OF THE TWENTY-FIFTH INTERNATIONAL CONFERENCE ON VERY LARGE DATA BASES, P518