Prediction of Phishing Websites Using Machine Learning Techniques

被引:0
作者
Laxman, Harith [1 ]
Prasad, Eswara [1 ]
Aravinth, R. [1 ]
Anish, C. [1 ]
Sudha, S. [1 ]
机构
[1] Natl Inst Technol, Elect & Elect Eng, Tiruchirappalli, India
来源
2024 INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTING AND EMERGING COMMUNICATION TECHNOLOGIES, ICEC | 2024年
关键词
Phishing detection; Cybersecurity; Machine Learning; Content-text classification; Visual Similarity;
D O I
10.1109/ICEC59683.2024.10837329
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A phishing attack is a cyber attack that generates a fake website acting as a trusted website to steal sensitive information like passwords and credit card information with the motive to misuse it. Phishing attacks have increased over time and differentiating a phishing site from a legitimate one is becoming increasingly difficult. Therefore, checking the legitimacy of visited web pages is a crucial task to secure customers' identities and prevent phishing attacks. The literature is rich with machine learning-based techniques for detecting phishing websites. However, all the present methods have shortcomings as they tend to focus on a particular feature set, thereby prone to misclassification when implemented in real-time. Hence, in this work, we propose a novel 3-level pipeline architecture to detect phishing websites with significant performance using machine learning techniques. The proposed architecture combines three methods of detection namely domain-based, content-based, and visual similarity-based. The domain-based method classifies websites based on domain details like URL, IP Address, Google Indexing, etc, whereas, the content-based model uses the content text for classification. The visual similarity model compares the screenshot of the phishing website with the closest legitimate website. XGBoost, Logistic Regression, and Triplet Network gave the best performance for the domain-based model, the content-based model, and the visual similarity model, respectively. The proposed three-layer architecture is implemented along with a Google Chrome extension to enable real-time detection. The proposed model achieved 96. 429% accuracy and 98. 989% recall on the considered real-time dataset and performed better compared to all existing single-layer architecture-based methods.
引用
收藏
页码:743 / 751
页数:9
相关论文
共 16 条
[1]   VisualPhishNet: Zero-Day PhishingWebsite Detection by Visual Similarity [J].
Abdelnabi, Sahar ;
Krombholz, Katharina ;
Fritz, Mario .
CCS '20: PROCEEDINGS OF THE 2020 ACM SIGSAC CONFERENCE ON COMPUTER AND COMMUNICATIONS SECURITY, 2020, :1681-1698
[2]  
Abu-Nimeh S., 2007, eCrime '07, DOI [10.1145/1299015.1299021, DOI 10.1145/1299015.1299021]
[3]  
Cao Y., 2008, DIM '08, P51
[4]   Towards benchmark datasets for machine learning based website phishing detection: An experimental study [J].
Hannousse, Abdelhakim ;
Yahiouche, Salima .
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2021, 104
[5]  
Jain Ankit Kumar, 2018, Cyber Security. Proceedings of CSI 2015. Advances in Intelligent Systems and Computing (AISC 729), P467, DOI 10.1007/978-981-10-8536-9_44
[6]   Feature Selections for the Classification of Webpages to Detect Phishing Attacks: A Survey [J].
Korkmaz, Mehmet ;
Sahingoz, Ozgur Koray ;
Diri, Banu .
2ND INTERNATIONAL CONGRESS ON HUMAN-COMPUTER INTERACTION, OPTIMIZATION AND ROBOTIC APPLICATIONS (HORA 2020), 2020, :365-373
[7]   Robust intra-document locations [J].
Phelps, TA ;
Wilensky, R .
COMPUTER NETWORKS-THE INTERNATIONAL JOURNAL OF COMPUTER AND TELECOMMUNICATIONS NETWORKING, 2000, 33 (1-6) :105-118
[8]  
Prakash P, 2010, IEEE INFOCOM SER
[9]   CatchPhish: detection of phishing websites by inspecting URLs [J].
Rao, Routhu Srinivasa ;
Vaishnavi, Tatti ;
Pais, Alwyn Roshan .
JOURNAL OF AMBIENT INTELLIGENCE AND HUMANIZED COMPUTING, 2020, 11 (02) :813-825
[10]   "Why Should I Trust You?" Explaining the Predictions of Any Classifier [J].
Ribeiro, Marco Tulio ;
Singh, Sameer ;
Guestrin, Carlos .
KDD'16: PROCEEDINGS OF THE 22ND ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2016, :1135-1144