Zero-Day Malware Detection and Effective Malware Analysis Using Shapley Ensemble Boosting and Bagging Approach

被引:14
作者
Kumar, Rajesh [1 ]
Subbiah, Geetha [1 ]
机构
[1] Vellore Inst Technol, Sch Comp Sci & Engn, Chennai Campus, Chennai 600127, Tamil Nadu, India
关键词
machine learning; computer security; artificial intelligence; boosting; bagging; cyber security; zero-day vulnerability; zero-day malware detection; Shapley value;
D O I
10.3390/s22072798
中图分类号
O65 [分析化学];
学科分类号
070302 ; 081704 ;
摘要
Software products from all vendors have vulnerabilities that can cause a security concern. Malware is used as a prime exploitation tool to exploit these vulnerabilities. Machine learning (ML) methods are efficient in detecting malware and are state-of-art. The effectiveness of ML models can be augmented by reducing false negatives and false positives. In this paper, the performance of bagging and boosting machine learning models is enhanced by reducing misclassification. Shapley values of features are a true representation of the amount of contribution of features and help detect top features for any prediction by the ML model. Shapley values are transformed to probability scale to correlate with a prediction value of ML model and to detect top features for any prediction by a trained ML model. The trend of top features derived from false negative and false positive predictions by a trained ML model can be used for making inductive rules. In this work, the best performing ML model in bagging and boosting is determined by the accuracy and confusion matrix on three malware datasets from three different periods. The best performing ML model is used to make effective inductive rules using waterfall plots based on the probability scale of features. This work helps improve cyber security scenarios by effective detection of false-negative zero-day malware.
引用
收藏
页数:23
相关论文
共 26 条
[1]  
Alazab M., 2010, C RES PRACT INF TECH, V121, P171
[2]  
Anderson H. S., 2018, ARXIV180404637
[3]   The Need for Speed: An Analysis of Brazilian Malware Classifiers [J].
Ceschin, Fabricio ;
Pinage, Felipe ;
Castilho, Marcos ;
Menotti, David ;
Oliveira, Luiz S. ;
Gregio, Andre .
IEEE SECURITY & PRIVACY, 2018, 16 (06) :31-41
[4]  
Egelman S, 2013, PROCEEDINGS OF THE 2013 NEW SECURITY PARADIGMS WORKSHOP (NSPW'13), P41
[5]  
Fleshman W, 2018, PROCEEDINGS OF THE 2018 13TH INTERNATIONAL CONFERENCE ON MALICIOUS AND UNWANTED SOFTWARE (MALWARE 2018), P3, DOI 10.1109/MALWARE.2018.8659360
[6]  
Harang R., 2018, Measuring the speed of the red queen's race
[7]   An Efficient DenseNet-Based Deep Learning Model for Malware Detection [J].
Hemalatha, Jeyaprakash ;
Roseline, S. Abijah ;
Geetha, Subbiah ;
Kadry, Seifedine ;
Damasevicius, Robertas .
ENTROPY, 2021, 23 (03)
[8]   Neurlux: Dynamic Malware Analysis Without Feature Engineering [J].
Jindal, Chani ;
Salls, Christopher ;
Aghakhani, Hojjat ;
Long, Keith ;
Kruegel, Christopher ;
Vigna, Giovanni .
35TH ANNUAL COMPUTER SECURITY APPLICATIONS CONFERENCE (ACSA), 2019, :444-455
[9]  
Jung W., 2015, P 36 IEEE S SEC PRIV, P2
[10]  
Kardan N., 2016, ARXIV160902226