Interpretable Machine Learning Models for Malicious Domains Detection Using Explainable Artificial Intelligence (XAI)

被引：32

作者：

Aslam, Nida ^{[1
]}

Khan, Irfan Ullah ^{[2
]}

Mirza, Samiha ^{[2
]}

AlOwayed, Alanoud ^{[2
]}

Anis, Fatima M. ^{[2
]}

Aljuaid, Reef M. ^{[2
]}

Baageel, Reham ^{[2
]}

机构：

[1] Imam Abdulrahman Bin Faisal Univ, Coll Comp Sci & Informat Technol, SAUDI ARAMCO Cybersecur Chair, POB 1982, Dammam 31441, Saudi Arabia

[2] Imam Abdulrahman Bin Faisal Univ, Coll Comp Sci & Informat Technol, Dept Comp Sci, Dammam 31441, Saudi Arabia

来源：

SUSTAINABILITY | 2022年 / 14卷 / 12期

关键词：

network security; malicious domains; machine learning; ensemble models; explainable artificial intelligence;

D O I：

10.3390/su14127375

中图分类号：

X [环境科学、安全科学];

学科分类号：

08 ; 0830 ;

摘要：

With the expansion of the internet, a major threat has emerged involving the spread of malicious domains intended by attackers to perform illegal activities aiming to target governments, violating privacy of organizations, and even manipulating everyday users. Therefore, detecting these harmful domains is necessary to combat the growing network attacks. Machine Learning (ML) models have shown significant outcomes towards the detection of malicious domains. However, the "black box" nature of the complex ML models obstructs their wide-ranging acceptance in some of the fields. The emergence of Explainable Artificial Intelligence (XAI) has successfully incorporated the interpretability and explicability in the complex models. Furthermore, the post hoc XAI model has enabled the interpretability without affecting the performance of the models. This study aimed to propose an Explainable Artificial Intelligence (XAI) model to detect malicious domains on a recent dataset containing 45,000 samples of malicious and non-malicious domains. In the current study, initially several interpretable ML models, such as Decision Tree (DT) and Naive Bayes (NB), and black box ensemble models, such as Random Forest (RF), Extreme Gradient Boosting (XGB), AdaBoost (AB), and Cat Boost (CB) algorithms, were implemented and found that XGB outperformed the other classifiers. Furthermore, the post hoc XAI global surrogate model (Shapley additive explanations) and local surrogate LIME were used to generate the explanation of the XGB prediction. Two sets of experiments were performed; initially the model was executed using a preprocessed dataset and later with selected features using the Sequential Forward Feature selection algorithm. The results demonstrate that ML algorithms were able to distinguish benign and malicious domains with overall accuracy ranging from 0.8479 to 0.9856. The ensemble classifier XGB achieved the highest result, with an AUC and accuracy of 0.9991 and 0.9856, respectively, before the feature selection algorithm, while there was an AUC of 0.999 and accuracy of 0.9818 after the feature selection algorithm. The proposed model outperformed the benchmark study.

引用

页数：22

共 41 条

[1]

Akarsh S, 2019, INT CONF ADVAN COMPU, P666, DOI [10.1109/icaccs.2019.8728544, 10.1109/ICACCS.2019.8728544]

[2] MaldomDetector: A system for detecting algorithmically generated domain names with machine learning [J].

Almashhadani, Ahmad O. ;

Kaiiali, Mustafa ;

Carlin, Domhnall ;

Sezer, Sakir .

COMPUTERS & SECURITY, 2020, 93

[3]

[Anonymous], FORWARD DNS FDNS RAP

[4]

[Anonymous], BENIGN MALICIOUS DOM

[5]

[Anonymous], IBM WATSON EXPLAINAB

[6]

[Anonymous], DoHBrw 2020-Datasets-Research-Canadian Institute for Cybersecurity-UNB-unb.ca

[7]

[Anonymous], DNS WAS NOT DESIGNED

[8] Current Challenges and Future Opportunities for XAI in Machine Learning-Based Clinical Decision Support Systems: A Systematic Review [J].

Antoniadi, Anna Markella ;

Du, Yuhan ;

Guendouz, Yasmine ;

Wei, Lan ;

Mazo, Claudia ;

Becker, Brett A. ;

Mooney, Catherine .

APPLIED SCIENCES-BASEL, 2021, 11 (11)

[9] Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI [J].

Barredo Arrieta, Alejandro ;

Diaz-Rodriguez, Natalia ;

Del Ser, Javier ;

Bennetot, Adrien ;

Tabik, Siham ;

Barbado, Alberto ;

Garcia, Salvador ;

Gil-Lopez, Sergio ;

Molina, Daniel ;

Benjamins, Richard ;

Chatila, Raja ;

Herrera, Francisco .

INFORMATION FUSION, 2020, 58 :82-115

[10] Feature Engineering and Machine Learning Model Comparison for Malicious Activity Detection in the DNS-Over-HTTPS Protocol [J].

Behnke, Matthew ;

Briner, Nathan ;

Cullen, Drake ;

Schwerdtfeger, Katelynn ;

Warren, Jackson ;

Basnet, Ram ;

Doleck, Tenzin .

IEEE ACCESS, 2021, 9 :129902-129916

← 1 2 3 4 5 →