Towards an Understanding of the Misclassification Rates of Machine Learning-based Malware Detection Systems

被引:2
作者
Alruhaily, Nada [1 ]
Bordbar, Behzad [1 ]
Chothia, Tom [1 ]
机构
[1] Univ Birmingham, Sch Comp Sci, Birmingham, W Midlands, England
来源
ICISSP: PROCEEDINGS OF THE 3RD INTERNATIONAL CONFERENCE ON INFORMATION SYSTEMS SECURITY AND PRIVACY | 2017年
基金
英国工程与自然科学研究理事会;
关键词
Malware; Classification Algorithms; Machine Learning; Behavioural Analysis;
D O I
10.5220/0006174301010112
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
A number of machine learning based malware detection systems have been suggested to replace signature based detection methods. These systems have shown that they can provide a high detection rate when recognising non-previously seen malware samples. However, in systems based on behavioural features, some new malware can go undetected as a result of changes in behaviour compared to the training data. In this paper we analysed misclassified malware instances and investigated whether there were recognisable patterns across these misclassifications. Several questions needed to be understood: Can we claim that malware changes over time directly affect the detection rate? Do changes that affect classification occur in malware at the level of families, where all instances that belong to certain families are hard to detect? Alternatively, can such changes be traced back to certain malware variants instead of families? Our experiments showed that these changes are mostly due to behavioural changes at the level of variants across malware families where variants did not behave as expected. This can be due to the adoption of anti-virtualisation techniques, the fact that these variants were looking for a specific argument to be activated or it can be due to the fact that these variants were actually corrupted.
引用
收藏
页码:101 / 112
页数:12
相关论文
共 49 条
[1]  
Alazab M., 2010, Malware detection based on structural and behavioural features of api calls
[2]  
[Anonymous], 2010, P ACM S APPL COMP
[3]  
[Anonymous], IEEE T NEURAL NETW L
[4]  
[Anonymous], 2007, Supervised machine learning: A review of classification techniques
[5]  
[Anonymous], 2015, Internet Security Threat Report
[6]  
Bailey M, 2007, LECT NOTES COMPUT SC, V4637, P178
[7]  
Bo-yun ZhangJ.-p. Y., 2006, International Journal of Computational Intelligence Research, V2, P100
[8]   Bagging predictors [J].
Breiman, L .
MACHINE LEARNING, 1996, 24 (02) :123-140
[9]  
Ceron JM, 2016, 2016 IEEE SYMPOSIUM ON COMPUTERS AND COMMUNICATION (ISCC), P525, DOI 10.1109/ISCC.2016.7543792
[10]  
Chang EY, 2003, 2003 INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, VOL 3, PROCEEDINGS, P609