Explainable Machine Learning for Bag of Words-Based Phishing Detection

被引:1
作者
Calzarossa, Maria Carla [1 ]
Giudici, Paolo [2 ]
Zieni, Rasha [1 ]
机构
[1] Univ Pavia, Dept Elect Comp & Biomed Engn, Pavia, Italy
[2] Univ Pavia, Dept Econ & Management, Pavia, Italy
来源
EXPLAINABLE ARTIFICIAL INTELLIGENCE, XAI 2023, PT I | 2023年 / 1901卷
关键词
Explainable machine learning; Phishing detection; Lorenz Zonoid;
D O I
10.1007/978-3-031-44064-9_28
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Phishing is a fraudulent practice aimed at convincing individuals to reveal sensitive information, such as account credentials or credit card details, by clicking the links of malicious websites. To reduce the impacts of phishing, the timely identification of these websites is essential. For this purpose, machine learning models are often devised. In this paper, we address the problem of website phishing detection by proposing an explainable machine learning model based on bag of words features extracted from the content of the webpages. To select the most important features to be used in the model, we propose to employ the Lorenz Zonoid, the multidimensional generalization of the Gini coefficient. The resulting model is characterized by a good accuracy and it provides explanations of which words are most likely associated with phishing websites. In addition, the number of features retained is significantly reduced, thus making the model parsimonious and easier to interpret.
引用
收藏
页码:531 / 543
页数:13
相关论文
共 22 条
  • [1] Blum A., 2010, P 3 ACM WORKSH ART I, P54
  • [2] Bracke P., 2019, Staff Working Paper, V816
  • [3] Explainable Machine Learning in Credit Risk Management
    Bussmann, Niklas
    Giudici, Paolo
    Marinelli, Dimitri
    Papenbrock, Jochen
    [J]. COMPUTATIONAL ECONOMICS, 2021, 57 (01) : 203 - 216
  • [4] Explainable machine learning for phishing feature detection
    Calzarossa, Maria Carla
    Giudici, Paolo
    Zieni, Rasha
    [J]. QUALITY AND RELIABILITY ENGINEERING INTERNATIONAL, 2024, 40 (01) : 362 - 373
  • [5] DeltaPhish: Detecting Phishing Webpages in Compromised Websites
    Corona, Igino
    Biggio, Battista
    Contini, Matteo
    Piras, Luca
    Corda, Roberto
    Mereu, Mauro
    Mureddu, Guido
    Ariu, Davide
    Roli, Fabio
    [J]. COMPUTER SECURITY - ESORICS 2017, PT I, 2018, 10492 : 370 - 388
  • [6] Shapley-Lorenz eXplainable Artificial Intelligence
    Giudici, Paolo
    Raffinetti, Emanuela
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2021, 167
  • [7] Lorenz Model Selection
    Giudici, Paolo
    Raffinetti, Emanuela
    [J]. JOURNAL OF CLASSIFICATION, 2020, 37 (03) : 754 - 768
  • [8] Phishing Detection Using URL-based XAI Techniques
    Hernandes Jr, Paulo R. Galego
    Floret, Camila P.
    de Almeida, Katia F. Cardozo
    da Silva, Vinicius Camargo
    Papa, Joso Paulo
    da Costa, Kelton A. Pontara
    [J]. 2021 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (IEEE SSCI 2021), 2021,
  • [9] A machine learning based approach for phishing detection using hyperlinks information
    Jain, Ankit Kumar
    Gupta, B. B.
    [J]. JOURNAL OF AMBIENT INTELLIGENCE AND HUMANIZED COMPUTING, 2019, 10 (05) : 2015 - 2028
  • [10] The Lorenz zonoid of a multivariate distribution
    Koshevoy, G
    Mosler, K
    [J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1996, 91 (434) : 873 - 882