Empirical Measurement of Performance Maintenance of Gradient Boosted Decision Tree Models for Malware Detection

被引:5
作者
Galen, Colin [1 ]
Steele, Robert [2 ]
机构
[1] Capitol Technol Univ, Comp Sci Lab, 11301 Springfield Rd, Laurel, MD 20708 USA
[2] Capitol Technol Univ, Dept Comp Sci, 11301 Springfield Rd, Laurel, MD 20708 USA
来源
3RD INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE IN INFORMATION AND COMMUNICATION (IEEE ICAIIC 2021) | 2021年
关键词
malware detection; artificial intelligence; performance maintenance; lightGBM; CatBoost; XGBoost;
D O I
10.1109/ICAIIC51459.2021.9415220
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Important for effective, real-world machine learning (ML) or artificial intelligence (M)-based main are detection systems is that models demonstrate both high discriminative performance at time of training and also demonstrate a high level of performance maintenance over time subsequent to training. That is, it is desirable that the models have a slow rate of performance decline over time as they encounter previously unseen malware threats. The study of malware detection model empirical performance maintenance on real world data sets has not been widely addressed despite significant work on MI-based malty are detection in general. In this work, we evaluate performance maintenance characteristics of models using a large, one million instance malware-goods are dataset spanning executables collected over one year in duration. Based on the outperformance of gradient boosted decision tree-based models, vse investigate this category of model further and demonstrate models with performance and performance maintenance superior to that demonstrated in the previous ML-based malware detection literature. Given the large site of the dataset of real-world executables utilized, the insights into model performance maintenance may have valuable implications for real-world MLbased main are detection systems.
引用
收藏
页码:193 / 198
页数:6
相关论文
共 26 条
  • [1] Anderson H. S., 2018, ARXIV PREPRINT ARXIV
  • [2] The Need for Speed: An Analysis of Brazilian Malware Classifiers
    Ceschin, Fabricio
    Pinage, Felipe
    Castilho, Marcos
    Menotti, David
    Oliveira, Luiz S.
    Gregio, Andre
    [J]. IEEE SECURITY & PRIVACY, 2018, 16 (06) : 31 - 41
  • [3] XGBoost: A Scalable Tree Boosting System
    Chen, Tianqi
    Guestrin, Carlos
    [J]. KDD'16: PROCEEDINGS OF THE 22ND ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2016, : 785 - 794
  • [4] Elastic, 2020, EMBER2018 DAT
  • [5] Comparative Analysis of Low-Dimensional Features and Tree-Based Ensembles for Malware Detection Systems
    Euh, Seoungyul
    Lee, Hyunjong
    Kim, Donghoon
    Hwang, Doosung
    [J]. IEEE ACCESS, 2020, 8 : 76796 - 76808
  • [6] DeepDetectNet vs RLAttackNet: An adversarial method to improve deep learning-based static malware detection model
    Fang, Yong
    Zeng, Yuetian
    Li, Beibei
    Liu, Liang
    Zhang, Lei
    [J]. PLOS ONE, 2020, 15 (04):
  • [7] Freund Y., 1996, Machine Learning. Proceedings of the Thirteenth International Conference (ICML '96), P148
  • [8] Greedy function approximation: A gradient boosting machine
    Friedman, JH
    [J]. ANNALS OF STATISTICS, 2001, 29 (05) : 1189 - 1232
  • [9] Galen C., 2020, P 4 IEEE WORKSH DAT
  • [10] Galen C., 2021, P 54 ANN HICSS C