Building a practical and reliable classifier for malware detection

被引:0
作者
Vatamanu, Cristina [1 ,2 ]
Gavrilut, Dragos [2 ,3 ]
Benchea, Razvan-Mihai [2 ,3 ]
机构
[1] Gheorghe Asachi Univ, Iasi, Romania
[2] Bitdefender, Iasi, Romania
[3] Alexandru Ioan Cuza Univ, Iasi, Romania
关键词
Malware detection; One side class algorithm; False positives; Machine learning; Large data sets;
D O I
10.1007/s11416-013-0188-1
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Having a machine learning algorithm that can correctly classify malicious software has become a necessity as oldmethods of detection based on hashes and hand written heuristics tend to fail when dealing with the intensive flow of new malware. However, in order to be practical, the machine learning classifiers must also have a reasonable training time and a very small amount, preferably zero, of false positives. There were a few authors who addressed both these issues in their papers but creating such a model is more difficult when more than 3 million files are involved/needed in the training. We mapped a zero false positive perceptron in a new space, applied a feature selection algorithm and used the resulted model in an ensemble, voting or a rule based clustering system we've managed to achieve a detection rate around 99% and 0.07% false positives while keeping the training time suitable for large data sets.
引用
收藏
页码:205 / 214
页数:10
相关论文
共 22 条
[1]  
Aizerman M.A., 1964, AUTOMAT REM CONTR, V25, P821, DOI DOI 10.1234/12345678
[2]  
Altaher A, 2011, AUSTR J BASIC APPL S, V5, P1482
[3]  
[Anonymous], 2004, OSDI 04 6 S OP SYST
[4]  
[Anonymous], 2010, HUMAN LANGUAGE TECHN
[5]  
Chu C.T., 2006, NIPS, V6, P281
[6]  
Dai JY, 2009, J COMPUT, V4, P405
[7]  
Domingos P., 2009, P 5 INT C KNOWL DISC, P155
[8]   Large margin classification using the perceptron algorithm [J].
Freund, Y ;
Schapire, RE .
MACHINE LEARNING, 1999, 37 (03) :277-296
[9]  
Gavrilut Dragos, 2009, Proceedings of the 2009 International Multiconference on Computer Science and Information Technology (IMCSIT), P735, DOI 10.1109/IMCSIT.2009.5352759
[10]  
Gavrilut D., 2012, P SYNASC C TIM