Explainable AI model for PDFMal detection based on gradient boosting model

被引:2
作者
Elattar, Mona [1 ,2 ]
Younes, Ahmed [2 ]
Gad, Ibrahim [1 ]
Elkabani, Islam [2 ,3 ]
机构
[1] Department of Computer Science, Faculty of Science, Tanta University, Tanta
[2] Department of Mathematics and Computer Science, Faculty of Science, Alexandria University, Alexandria
[3] Faculty of Computer Science and Engineering, Al Alamein International University, Alamein
关键词
Explainable artificial intelligence (XAI); Malicious PDF; Malware detection; Tree-based ensemble models;
D O I
10.1007/s00521-024-10314-y
中图分类号
学科分类号
摘要
Portable document formats (PDFs) are widely used for document exchange due to their widespread usage and versatility. However, PDFs are highly vulnerable to malware attacks, which pose significant security risks. Existing defense mechanisms often struggle to effectively detect and mitigate these threats, highlighting the need for more robust solutions. This paper introduces a robust framework that uses advanced tree-based ensemble models to detect malicious PDFs using the Evasive-PDFMal2022 dataset. The proposed model achieves a recall rate of 100%, an accuracy rate of 99.95%, and a fast inference time of 0.1723 s. Furthermore, the framework exhibits minimal false positive and false negative rates, ensuring a high level of precision in distinguishing between malicious and benign PDFs. Shapley additive explanations are used to improve the interpretability and reliability of the model’s predictions. The results highlight the effectiveness of the proposed model in improving PDF document security and addressing the challenges posed by malware attacks. © The Author(s) 2024.
引用
收藏
页码:21607 / 21622
页数:15
相关论文
共 41 条
[1]  
Fleury N., Dubrunquez T., Alouani I., Pdf-malware: An overview on threats, detection and evasion attacks, Arxiv Abs/, 2107, (2021)
[2]  
Liu D., Wang H., Stavrou A., Detecting malicious javascript in pdf through document instrumentation, 2014 44Th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, pp. 100-111, (2014)
[3]  
Issakhani M., Victor P., Tekeoglu A., Habibi Lashkari A., Pdf malware detection based on stacking learning, pp. 562-570, (2022)
[4]  
Jeong Y.-S., Woo J., Kang A.R., Malware detection on byte streams of pdf files using convolutional neural networks, Secur Commun Netw, 2019, pp. 1-9, (2019)
[5]  
Sudhakar K.S., An emerging threat fileless malware: a survey and research challenges, Cybersecurity, (2020)
[6]  
Rawal B.S., Manogaran G., Peter A., Effective cybersecurity, pp. 87-102, (2023)
[7]  
Alsmadi T., Alqudah N., A survey on malware detection techniques, 2021 International conference on information technology (ICIT), pp. 371-376, (2021)
[8]  
Souri A., Hosseini R., A state-of-the-art survey of malware detection approaches using data mining techniques, Hum Centric Comput Inf Sci, (2018)
[9]  
Al-Marghilani A., Comprehensive analysis of iot malware evasion techniques, Eng Technol Appl Sci Res, 11, 4, pp. 7495-7500, (2021)
[10]  
Doreswamy H.M.K., Gad I., Feature selection approach using ensemble learning for network anomaly detection, CAAI Trans Intell Technol, 5, 4, pp. 283-293, (2020)