A high-accuracy phishing website detection method based on machine learning

被引:9
|
作者
Bahaghighat, Mahdi [1 ]
Ghasemi, Majid [1 ]
Ozen, Figen [2 ]
机构
[1] Imam Khomeini Int Univ, Dept Comp Engn, Qazvin, Iran
[2] Halic Univ, Istanbul, Turkiye
关键词
Phishing website detection; Cyber security; Machine learning; Classification; XGBoost;
D O I
10.1016/j.jisa.2023.103553
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The rapid development of e-commerce, e-banking, and social networks has made phishing attack detection one of the most critical technologies in all cyber security systems. To improve the efficiency of anti-phishing techniques, we present an improved predictive model based on machine learning. The proposed method uses six different algorithms; Logistic Regression, K-Nearest Neighbors, Naive Bayes, Random Forest, Support Vector Machine, and Extreme Gradient Boosting (XGBoost). The experiments are based on a public dataset of 58,000 legitimate websites and 30,647 phishing ones, including 112 attributes for each sample. Our evaluations in the feature selection process show that after balancing the dataset and dropping constant features, a noticeable improvement can be achieved. We conducted our evaluation found on eight major unique scenarios. The experimental results of our phishing websites detection (PWD) method indicate remarkable performances in which each algorithm reached an accuracy of more than 93%, and the XGBoost classifier outperforms others with 99.2% overall accuracy, 99.1% precision, 99.4% recall, and 99.1% specificity. In addition, the study achieved optimal run-time of about 1500 ms for the XGBoost algorithm without dimension reduction while using Principal Component Analysis (PCA) reduces it down to just 869 ms. As a result, the proposed approach would be practical in both offline and real-time applications.
引用
收藏
页数:17
相关论文
共 50 条
  • [21] Machine Learning-Based Phishing Attack Detection
    Hossain, Sohrab
    Sarma, Dhiman
    Chakma, Rana Joyti
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2020, 11 (09) : 378 - 388
  • [22] An Improved Method of Phishing URL Detection Using Machine Learning
    Sugantham, Amy Joyce, V
    Mishra, Pradeepta
    Agarwal, Rashmi
    SMART TRENDS IN COMPUTING AND COMMUNICATIONS, VOL 5, SMARTCOM 2024, 2024, 949 : 245 - 254
  • [23] A Deep Learning-Based Framework for Phishing Website Detection
    Tang, Lizhen
    Mahmoud, Qusay H.
    IEEE ACCESS, 2022, 10 : 1509 - 1521
  • [24] Intrusion detection based on phishing detection with machine learning
    Jayaraj R.
    Pushpalatha A.
    Sangeetha K.
    Kamaleshwar T.
    Udhaya Shree S.
    Damodaran D.
    Measurement: Sensors, 2024, 31
  • [25] COMPARISON OF MACHINE LEARNING TECHNIQUES IN PHISHING WEBSITE CLASSIFICATION
    Hodzic, Adnan
    Kevric, Jasmin
    Karadag, Adem
    INTERNATIONAL CONFERENCE ON ECONOMIC AND SOCIAL STUDIES (ICESOS'16): REGIONAL ECONOMIC DEVELOPMENT: ENTREPNEURSHIP AND INNOVATION, 2016, : 249 - 256
  • [26] Intelligent Ensemble Learning Approach for Phishing Website Detection Based on Weighted Soft Voting
    Taha, Altyeb
    MATHEMATICS, 2021, 9 (21)
  • [27] A Comprehensive Survey on Identification and Analysis of Phishing Website based on Machine Learning Methods
    Alkawaz, Mohammed Hazim
    Steven, Stephanie Joanne
    Hajamydeen, Asif Iqbal
    Ramli, Rusyaizila
    11TH IEEE SYMPOSIUM ON COMPUTER APPLICATIONS & INDUSTRIAL ELECTRONICS (ISCAIE 2021), 2021, : 82 - 87
  • [28] High-accuracy detection of airway obstruction in asthma using machine learning algorithms and forced oscillation measurements
    Amaral, Jorge L. M.
    Lopes, Agnaldo J.
    Veiga, Juliana
    Faria, Alvaro C. D.
    Melo, Pedro L.
    COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE, 2017, 144 : 113 - 125
  • [29] Phishing Websites Detection using Machine Learning
    Kulkarni, Arun
    Brown, Leonard L., III
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2019, 10 (07) : 8 - 13
  • [30] Machine Learning Approach Based on Hybrid Features for Detection of Phishing URLs
    Ghimire, Awishkar
    Jha, Avinash Kumar
    Thapa, Surendrahikram
    Mishra, Sushruti
    Jha, Aryan Mani
    2021 11TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING, DATA SCIENCE & ENGINEERING (CONFLUENCE 2021), 2021, : 954 - 959