An Improved Method of Phishing URL Detection Using Machine Learning

被引:0
作者
Sugantham, Amy Joyce, V [1 ]
Mishra, Pradeepta [1 ]
Agarwal, Rashmi [1 ]
机构
[1] REVA Univ, REVA Acad Corp Excellence RACE, Bengaluru, India
来源
SMART TRENDS IN COMPUTING AND COMMUNICATIONS, VOL 5, SMARTCOM 2024 | 2024年 / 949卷
关键词
Phishing URL; Machine Learning; Feature engineering; Random Forest algorithm; Logistic regression; Classification;
D O I
10.1007/978-981-97-1313-4_21
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The Internet has become an integral part of our lives over the past few years, providing us with essential services. However, the increasing dependency on the web has also led to a surge in cyber-attacks and fraudulent activities, making it crucial to identify malicious websites. Phishing attacks are the leading cause of internet data breaches. According to the FBI, these attacks are expected to increase each year. Shockingly, only 57% of organizations have URL protection in place. Successful phishing attempts can result in data loss, system compromise, and ransomware. Phishing attacks target financial companies, social media firms, software as a service company, and retail sellers the most. One of the most critical factors in determining whether a website is safe or not is its Uniform Resource Locator. Despite numerous measures taken by cybersecurity experts to identify phishing URLs, attackers always find new ways to attack and breach existing antiphishing defenses. To combat this growing threat, an improved approach to detecting phishing URLs is proposed. A dataset from the Security Repository consisting of both normal and malicious URLs is used, and five supervised Machine Learning algorithms are applied to it. Fourteen important attributes contributing to a phishing URL are extracted by feature engineering. To test the URLs, a DNS toolkit called DNSPython, which queries and resolves name servers, is used, and the DNS records of the URLs are used as the target variable. Additionally, a web interface is built using Flask with the attributes from the best-performing classifier to show the prediction of the URLs based on the detection, providing a user-friendly and efficient to identify malicious websites. It is concluded that the Random Forest algorithm provided the highest accuracy score of 96.38% among all models. The proposed model has proved to be very effective in detecting phishing URLs.
引用
收藏
页码:245 / 254
页数:10
相关论文
共 10 条
[1]  
Abu Al-Haija Qasem, 2021, 2021 International Conference on Data Analytics for Business and Industry (ICDABI), P644, DOI 10.1109/ICDABI53623.2021.9655851
[2]  
Adzhar Afiqah Aqilah, 2022, 2022 IEEE International Conference on Computing (ICOCO), P96, DOI 10.1109/ICOCO56118.2022.10031671
[3]  
[Anonymous], Onesmus Mbaabu introduction to random forest classifer
[4]  
apwg, APWG Phishing Activity Trends Report
[5]  
Banerjee P, Catboost classifer in Python
[6]  
businessinsider, Dave Johnson what is a DNS server?
[7]  
Craig Taylor Working of Phishing, About us
[8]  
Kurama V, Adaptive boosting
[9]  
Phishing.org, About us
[10]  
R Upendra Shetty D., 2023, 2023 International Conference on Intelligent Data Communication Technologies and Internet of Things (IDCIoT), P470, DOI 10.1109/IDCIoT56793.2023.10053422