Malicious URL Detection Using Machine Learning

被引:1
作者
Hani, Dr Raed Bani [1 ]
Amoura, Motasem [1 ]
Ammourah, Mohammad [1 ]
Abu Khalil, Yazeed [1 ]
机构
[1] Jordan Univ Sci & Technol, Network Engn & Secur Dept, Irbid, Jordan
来源
2024 15TH INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION SYSTEMS, ICICS 2024 | 2024年
关键词
malicious URL detection; Machine Learning; Decision Trees; KNN; ANN; Classification; Binary Classification;
D O I
10.1109/ICICS63486.2024.10638299
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Given the fact that we are in an era where cybersecurity threats have gotten increasingly advanced, it is essential to have strong prediction and detection systems that can distinguish between safe and harmful websites. This paper provides an innovative method of dealing with this situation, using different Machine Learning techniques, to efficiently identify malicious URLs. Using Python, the proposed method uses a well-selected dataset consisting of 20,000 URLs collected from 3 sources, and 60 features extracted from each URL, with an exact balance of 50% phishing URLs and 50% legitimate URLs. The main aim of the paper is to develop a machine learning based system that accurately classify URLs, contributing to achieving higher level of security. In this regard, the paper investigated the effectiveness of Random Forests (RFs), Decision Trees (DTs), Support Vector Machines (SVMs), k-Nearest Neighbors (KNNs), Logistic Regression, and Artificial Neural Networks (ANN). The experimental results show that system has an excellent performance. The test accuracy of the Random Forest Classifier reached 99% demonstrating its ability to separate legitimate and malicious URLs. In addition, ANN achieved an accuracy of 98%. Overall, five of the six tested algorithms reported accuracy greater than or equal 94.5 %.
引用
收藏
页数:5
相关论文
共 11 条
[1]  
Aldwairi Monther, 2012, Journal of Emerging Technologies in Web Intelligence, V4, P128, DOI 10.4304/jetwi.4.2.128-133
[2]  
Deshpande A., 2021, INTERNATIONAL JOURNAL OF ENGINEERING RESEARCH & TECHNOLOGY (IJERT), V10
[3]  
Hannousse Abdelhakim, 2021, Mendeley Data, V3, DOI 10.17632/C2GW7FY2J4.3
[4]  
kaggle, About Us
[5]  
Kinger S., 2023, 2023 6 INT C CONT CO, P1062, DOI [10.1109/IC3I59117.2023.10397872, DOI 10.1109/IC3I59117.2023.10397872]
[6]  
Kulkarni A, 2019, INT J ADV COMPUT SC, V10, P8
[7]  
Lampe B., 2023, IEEE Communications Surveys & Tutorials
[8]   Phishing URLs Detection Using Sequential and Parallel ML Techniques: Comparative Analysis [J].
Nagy, Naya ;
Aljabri, Malak ;
Shaahid, Afrah ;
Ahmed, Amnah Albin ;
Alnasser, Fatima ;
Almakramy, Linda ;
Alhadab, Manar ;
Alfaddagh, Shahad .
SENSORS, 2023, 23 (07)
[9]  
Phishtank, US
[10]  
Priya Chiguru Keerthi, 2023, E3S Web Conf., V430, DOI [10.1051/e3sconf/202343001, DOI 10.1051/E3SCONF/202343001]