A high-accuracy phishing website detection method based on machine learning

被引:9
|
作者
Bahaghighat, Mahdi [1 ]
Ghasemi, Majid [1 ]
Ozen, Figen [2 ]
机构
[1] Imam Khomeini Int Univ, Dept Comp Engn, Qazvin, Iran
[2] Halic Univ, Istanbul, Turkiye
关键词
Phishing website detection; Cyber security; Machine learning; Classification; XGBoost;
D O I
10.1016/j.jisa.2023.103553
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The rapid development of e-commerce, e-banking, and social networks has made phishing attack detection one of the most critical technologies in all cyber security systems. To improve the efficiency of anti-phishing techniques, we present an improved predictive model based on machine learning. The proposed method uses six different algorithms; Logistic Regression, K-Nearest Neighbors, Naive Bayes, Random Forest, Support Vector Machine, and Extreme Gradient Boosting (XGBoost). The experiments are based on a public dataset of 58,000 legitimate websites and 30,647 phishing ones, including 112 attributes for each sample. Our evaluations in the feature selection process show that after balancing the dataset and dropping constant features, a noticeable improvement can be achieved. We conducted our evaluation found on eight major unique scenarios. The experimental results of our phishing websites detection (PWD) method indicate remarkable performances in which each algorithm reached an accuracy of more than 93%, and the XGBoost classifier outperforms others with 99.2% overall accuracy, 99.1% precision, 99.4% recall, and 99.1% specificity. In addition, the study achieved optimal run-time of about 1500 ms for the XGBoost algorithm without dimension reduction while using Principal Component Analysis (PCA) reduces it down to just 869 ms. As a result, the proposed approach would be practical in both offline and real-time applications.
引用
收藏
页数:17
相关论文
共 50 条
  • [31] Machine learning based phishing detection from URLs
    Sahingoz, Ozgur Koray
    Buber, Ebubekir
    Demir, Onder
    Diri, Banu
    EXPERT SYSTEMS WITH APPLICATIONS, 2019, 117 : 345 - 357
  • [32] Phishing Website Detection Using Deep Learning Models
    Zara, Ume
    Ayyub, Kashif
    Khan, Hikmat Ullah
    Daud, Ali
    Alsahfi, Tariq
    Ahmad, Saima Gulzar
    IEEE ACCESS, 2024, 12 : 167072 - 167087
  • [33] Spear Phishing Emails Detection Based on Machine Learning
    Ding, Xiong
    Liu, Baoxu
    Jiang, Zhengwei
    Wang, Qiuyun
    Xin, Liling
    PROCEEDINGS OF THE 2021 IEEE 24TH INTERNATIONAL CONFERENCE ON COMPUTER SUPPORTED COOPERATIVE WORK IN DESIGN (CSCWD), 2021, : 354 - 359
  • [34] Machine learning-based phishing attack detection
    Hossain S.
    Sarma D.
    Chakma R.J.
    International Journal of Advanced Computer Science and Applications, 2020, 11 (09): : 378 - 388
  • [35] Machine Learning-Based Phishing Attack Detection
    Hossain, Sohrab
    Sarma, Dhiman
    Chakma, Rana Joyti
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2020, 11 (09) : 378 - 388
  • [36] Machine Learning Based Phishing Web Sites Detection
    Huu Hieu Nguyen
    Duc Thai Nguyen
    AETA 2015: RECENT ADVANCES IN ELECTRICAL ENGINEERING AND RELATED SCIENCES, 2016, 371 : 123 - 131
  • [37] High-accuracy automatic classification of Parkinsonian tremor severity using machine learning method
    Jeon, Hyoseon
    Lee, Woongwoo
    Park, Hyeyoung
    Lee, Hong Ji
    Kim, Sang Kyong
    Kim, Han Byul
    Jeon, Beomseok
    Park, Kwang Suk
    PHYSIOLOGICAL MEASUREMENT, 2017, 38 (11) : 1980 - 1999
  • [38] An Improved Method of Phishing URL Detection Using Machine Learning
    Sugantham, Amy Joyce, V
    Mishra, Pradeepta
    Agarwal, Rashmi
    SMART TRENDS IN COMPUTING AND COMMUNICATIONS, VOL 5, SMARTCOM 2024, 2024, 949 : 245 - 254
  • [39] Machine LearningTechniquesfor Detection of Website Phishing: A Review forPromises and Challenges
    Odeh, Ammar
    Keshta, Ismail
    Abdelfattah, Eman
    2021 IEEE 11TH ANNUAL COMPUTING AND COMMUNICATION WORKSHOP AND CONFERENCE (CCWC), 2021, : 813 - 818
  • [40] High-Accuracy Detection of Early Parkinson's Disease through Multimodal Features and Machine Learning
    Prashanth, R.
    Roy, Sumantra Dutta
    Mandal, Pravat K.
    Ghosh, Shantanu
    INTERNATIONAL JOURNAL OF MEDICAL INFORMATICS, 2016, 90 : 13 - 21