Enhancing Phishing Detection Through Ensemble Learning and Cross-Validation

被引：0

作者：

Jawad, Samer Kadhim ^{[1
]}

Alnajjar, Satea Hikmat ^{[2
]}

机构：

[1] Al Iraqia Univ, Comp Engn, Baghdad, Iraq

[2] Al Iraqia Univ, Network Engn, Baghdad, Iraq

来源：

2024 INTERNATIONAL CONFERENCE ON SMART APPLICATIONS, COMMUNICATIONS AND NETWORKING, SMARTNETS-2024 | 2024年

关键词：

Phishing; Machine learning; Ensemble learning; Gradient Boosting Classifier; cross-validation;

D O I：

10.1109/SMARTNETS61466.2024.10577746

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Phishing is among the most worrying issues in a constantly changing world. Because of the rise in Internet usage, phishing has become a new type of data theft This type of cybercrime refers to the theft of private information and violation of privacy by focusing on human vulnerabilities and technical smuggling. URL phishing (Uniform Resource Locators) is one of the most common types. Detecting a malicious URL is a big challenge. This study concentrates on the enhancement of the phishing detection procedure through the utilization of ensemble learning approaches, notably the Gradient Boosting Classifier, CatBoost, and XGBoost algorithms. Leveraging a comprehensive dataset containing examples of both phishing sites and legitimate sites, the study includes comprehensive exploratory data analysis, rigorous data pre-processing, and rigorous model evaluation using cross-validation. The research extends to include importance analysis, using permutation techniques to reveal critical factors that influence the decision-making processes of models. The results demonstrate the effectiveness of ensemble learning in distinguishing between phishing and legitimate entities, The accuracy results reached 98.14% using Gradient Boosting Classifier and cross-validation technique. while providing valuable insights into the key features that lead to accurate predictions. This research advances the subject of cybersecurity by offering a comprehensive comprehension of crowd learning techniques and their useful applications in fortifying defenses against phishing attempts.

引用

页数：7

共 50 条

[41] Purposeful cross-validation: a novel cross-validation strategy for improved surrogate optimizability
Correia, Daniel
Wilke, Daniel N.
ENGINEERING OPTIMIZATION, 2021, 53 (09) : 1558 - 1573
[42] Cross-Validation Without Doing Cross-Validation in Genome-Enabled Prediction
Gianola, Daniel
Schoen, Chris-Carolin
G3-GENES GENOMES GENETICS, 2016, 6 (10): : 3107 - 3128
[43] Cross-validation is dead. Long live cross-validation! Model validation based on resampling
Knut Baumann
Journal of Cheminformatics, 2 (Suppl 1)
[44] ENHANCING NETWORK META-ANALYSIS THROUGH PREDICTIVE CROSS-VALIDATION: ASSESSING MODEL PERFORMANCE AND DETECTING OUTLIERS
Sharma, A.
Tripathi, N.
Singh, B.
Pandey, S.
VALUE IN HEALTH, 2024, 27 (12)
[45] Validation and Cross-Validation Methods for ASCAT
Anderson, Craig
Figa-Saldana, Julia
Wilson, John Julian William
Ticconi, Francesca
IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2017, 10 (05) : 2232 - 2239
[46] An Optimized Bagging Learning with Ensemble Feature Selection Method for URL Phishing Detection
Ponni Ponnusamy
Prabha Dhandayudam
Journal of Electrical Engineering & Technology, 2024, 19 : 1881 - 1889
[47] THE MODIFIED WORD-LEARNING TEST - A CROSS-VALIDATION STUDY
WALTON, D
WHITE, JG
BLACK, DA
YOUNG, AJ
BRITISH JOURNAL OF MEDICAL PSYCHOLOGY, 1959, 32 (03): : 213 - 220
[48] On Learning and Cross-Validation with Decomposed Nystrom Approximation of Kernel Matrix
Airola, Antti
Pahikkala, Tapio
Salakoski, Tapio
NEURAL PROCESSING LETTERS, 2011, 33 (01) : 17 - 30
[49] SUITOR: Selecting the number of mutational signatures through cross-validation
Lee, Donghyuk
Wang, Difei
Yang, Xiaohong R.
Shi, Jianxin
Landi, Maria Teresa
Zhu, Bin
PLOS COMPUTATIONAL BIOLOGY, 2022, 18 (04)
[50] A cross-validation scheme for machine learning algorithms in shotgun proteomics
Viktor Granholm
William Stafford Noble
Lukas Käll
BMC Bioinformatics, 13

← 1 2 3 4 5 →