RSTHFS: A Rough Set Theory-Based Hybrid Feature Selection Method for Phishing Website Classification

被引:0
作者
Setu, Jahanggir Hossain [1 ]
Halder, Nabarun [1 ]
Islam, Ashraful [1 ]
Amin, M. Ashraful [1 ]
机构
[1] Independent Univ, Ctr Computat & Data Sci, Dhaka 1229, Bangladesh
关键词
Phishing; Feature extraction; Accuracy; Support vector machines; Classification algorithms; Runtime; Classification tree analysis; Rough sets; Radio frequency; Principal component analysis; Cyber security; feature selection; hybrid feature; machine learning; phishing; phishing websites; rough set theory; RSTHFS;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Phishing is a pervasive form of cybercrime where malicious websites deceive users into revealing sensitive information, e.g., passwords and credit card details. Despite advances in cybersecurity, accurately detecting phishing websites remains challenging due to the absence of universally accepted identification parameters. This study introduces a novel feature selection method, Rough Set Theory-based Hybrid Feature Selection (RSTHFS), to enhance phishing website detection using Machine Learning (ML) techniques. Our approach was evaluated using three diverse datasets containing 2,456, 10,000, and 88,647 instances. The RSTHFS method demonstrated a significant improvement by maintaining an average accuracy rate of 95.48% while reducing the number of features by 69.11% on average. Performance was further assessed using three advanced classifiers: Light Gradient-Boosting Machine (LightGBM), Random Forest (RF), and Categorical Boosting (CatBoost), with CatBoost emerging as the most efficient, achieving the highest accuracy. Additionally, RSTHFS reduced the runtime by 61.43%, highlighting its efficiency. These findings indicate that RSTHFS is not only effective in identifying phishing websites but also accelerates ML processes, providing a reliable and swift approach to feature selection. This work contributes to the field by presenting a robust methodology that enhances the accuracy and speed of phishing detection systems.
引用
收藏
页码:68820 / 68830
页数:11
相关论文
共 48 条
[1]   VisualPhishNet: Zero-Day PhishingWebsite Detection by Visual Similarity [J].
Abdelnabi, Sahar ;
Krombholz, Katharina ;
Fritz, Mario .
CCS '20: PROCEEDINGS OF THE 2020 ACM SIGSAC CONFERENCE ON COMPUTER AND COMMUNICATIONS SECURITY, 2020, :1681-1698
[2]  
Adane K., 2022, Rev. Comput. Eng. Res., V9, P13
[3]   PDGAN: Phishing Detection With Generative Adversarial Networks [J].
Al-Ahmadi S. ;
Alotaibi A. ;
Alsaleh O. .
IEEE Access, 2022, 10 :42459-42468
[4]   The COVID-19 scamdemic: A survey of phishing attacks and their countermeasures during COVID-19 [J].
Al-Qahtani, Ali F. ;
Cresci, Stefano .
IET INFORMATION SECURITY, 2022, 16 (05) :324-345
[5]  
Al-Tamimi Y., 2023, Int. J. Data Netw. Sci., V7, P313
[6]  
Alsariera YA, 2022, J ENG SCI TECHNOL, V17, P563
[7]  
Anti-Phishing Working Group, 2023, About us
[8]   A Survey on Dimensionality Reduction Techniques for Time-Series Data [J].
Ashraf, Mohsena ;
Anowar, Farzana ;
Setu, Jahanggir H. ;
Chowdhury, Atiqul I. ;
Ahmed, Eshtiak ;
Islam, Ashraful ;
Al-Mamun, Abdullah .
IEEE ACCESS, 2023, 11 :42909-42923
[9]  
Aung E. S., 2019, DEIM forum, pG2
[10]   R-HEFS: Rough set based heterogeneous ensemble feature selection method for medical data classification [J].
Bania, Rubul Kumar ;
Halder, Anindya .
ARTIFICIAL INTELLIGENCE IN MEDICINE, 2021, 114