Empirical Measurement of Performance Maintenance of Gradient Boosted Decision Tree Models for Malware Detection

被引：5

作者：

Galen, Colin ^{[1
]}

Steele, Robert ^{[2
]}

机构：

[1] Capitol Technol Univ, Comp Sci Lab, 11301 Springfield Rd, Laurel, MD 20708 USA

[2] Capitol Technol Univ, Dept Comp Sci, 11301 Springfield Rd, Laurel, MD 20708 USA

来源：

3RD INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE IN INFORMATION AND COMMUNICATION (IEEE ICAIIC 2021) | 2021年

关键词：

malware detection; artificial intelligence; performance maintenance; lightGBM; CatBoost; XGBoost;

D O I：

10.1109/ICAIIC51459.2021.9415220

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Important for effective, real-world machine learning (ML) or artificial intelligence (M)-based main are detection systems is that models demonstrate both high discriminative performance at time of training and also demonstrate a high level of performance maintenance over time subsequent to training. That is, it is desirable that the models have a slow rate of performance decline over time as they encounter previously unseen malware threats. The study of malware detection model empirical performance maintenance on real world data sets has not been widely addressed despite significant work on MI-based malty are detection in general. In this work, we evaluate performance maintenance characteristics of models using a large, one million instance malware-goods are dataset spanning executables collected over one year in duration. Based on the outperformance of gradient boosted decision tree-based models, vse investigate this category of model further and demonstrate models with performance and performance maintenance superior to that demonstrated in the previous ML-based malware detection literature. Given the large site of the dataset of real-world executables utilized, the insights into model performance maintenance may have valuable implications for real-world MLbased main are detection systems.

引用

页码：193 / 198

页数：6

共 26 条

[1] Anderson H. S., 2018, ARXIV PREPRINT ARXIV
[2] The Need for Speed: An Analysis of Brazilian Malware Classifiers
Ceschin, Fabricio
Pinage, Felipe
Castilho, Marcos
Menotti, David
Oliveira, Luiz S.
Gregio, Andre
[J]. IEEE SECURITY & PRIVACY, 2018, 16 (06) : 31 - 41
[3] XGBoost: A Scalable Tree Boosting System
Chen, Tianqi
Guestrin, Carlos
[J]. KDD'16: PROCEEDINGS OF THE 22ND ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2016, : 785 - 794
[4] Elastic, 2020, EMBER2018 DAT
[5] Comparative Analysis of Low-Dimensional Features and Tree-Based Ensembles for Malware Detection Systems
Euh, Seoungyul
Lee, Hyunjong
Kim, Donghoon
Hwang, Doosung
[J]. IEEE ACCESS, 2020, 8 : 76796 - 76808
[6] DeepDetectNet vs RLAttackNet: An adversarial method to improve deep learning-based static malware detection model
Fang, Yong
Zeng, Yuetian
Li, Beibei
Liu, Liang
Zhang, Lei
[J]. PLOS ONE, 2020, 15 (04):
[7] Freund Y., 1996, Machine Learning. Proceedings of the Thirteenth International Conference (ICML '96), P148
[8] Greedy function approximation: A gradient boosting machine
Friedman, JH
[J]. ANNALS OF STATISTICS, 2001, 29 (05) : 1189 - 1232
[9] Galen C., 2020, P 4 IEEE WORKSH DAT
[10] Galen C., 2021, P 54 ANN HICSS C

← 1 2 3 →