Advanced Phishing Detection: Leveraging t-SNE Feature Extraction and Machine Learning on a Comprehensive URL Dataset

被引:0
作者
Etem, Taha [1 ]
Teke, Mustafa [2 ]
机构
[1] Cankiri Karatekin Univ, Comp Engn Dept, Cankiri, Turkiye
[2] Cankiri Karatekin Univ, Dept Elect Elect Engn, Cankiri, Turkiye
来源
ACTA INFOLOGICA | 2024年 / 8卷 / 02期
关键词
Machine learning; cybersecurity; feature extraction; data mining;
D O I
10.26650/acin.1521835
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Phishing attacks continue to pose a major challenge in today's digital world; thus, sophisticated detection techniques are required to address constantly changing tactics. In this paper, we have proposed an innovative method to identify phishing attempts using the extensive PhiUSIIL dataset. The proposed dataset comprises 134,850 legitimate URLs and 100,945 phishing URLs, providing a robust foundation for analysis. We applied the t-SNE technique for feature extraction, condensing the original 51 features into only 2, while preserving high detection accuracy. We evaluated several machine learning algorithms on both full and reduced datasets, including Logistic Regression, Naive Bayes, k-Nearest Neighbors (kNN), Decision Trees, and Random Forest. The Decision Tree algorithm showed the best performance on the original dataset, achieving 99.7% accuracy. Interestingly, the proposed kNN demonstrated remarkable results on feature-extracted data, achieving 99.2% accuracy. We observed significant improvements in Logistic Regression and Random Forest performance when using the feature-extracted dataset. The proposed method offers substantial benefits in terms of computational efficiency. The feature-extracted dataset requires less processing power; thus, it is well-suited for systems with limited resources. These findings pave the way for developing more powerful and flexible phishing detection systems that can identify and neutralize emerging threats in real-time scenarios.
引用
收藏
页码:213 / 221
页数:9
相关论文
共 27 条
[1]   Intelligent phishing detection system for e-banking using fuzzy data mining [J].
Aburrous, Maher ;
Hossain, M. A. ;
Dahal, Keshav ;
Thabtah, Fadi .
EXPERT SYSTEMS WITH APPLICATIONS, 2010, 37 (12) :7913-7921
[2]   Deep Learning with Convolutional Neural Network and Long Short-Term Memory for Phishing Detection [J].
Adebowale, M. A. ;
Lwin, K. T. ;
Hossain, M. A. .
2019 13TH INTERNATIONAL CONFERENCE ON SOFTWARE, KNOWLEDGE, INFORMATION MANAGEMENT AND APPLICATIONS (SKIMA), 2019,
[3]  
Alam Mohammad Nazmul, 2020, 2020 Third International Conference on Smart Systems and Inventive Technology (ICSSIT), P1173, DOI 10.1109/ICSSIT48917.2020.9214225
[4]   A novel nonlinear automated multi-class skin lesion detection system using soft-attention based convolutional neural networks [J].
Alhudhaif, Adi ;
Almaslukh, Bandar ;
Aseeri, Ahmad O. ;
Guler, Osman ;
Polat, Kemal .
CHAOS SOLITONS & FRACTALS, 2023, 170
[5]   The Efficiency of Regularization Method on Model Success in Issue Type Prediction Problem [J].
Alsac, Ali ;
Yenisey, Mehmet Mutlu ;
Ganiz, Murat Can ;
Dagtekin, Mustafa ;
Ulusinan, Taner .
ACTA INFOLOGICA, 2023, 7 (02) :360-383
[6]   Phishing Email Detection Model Using Deep Learning [J].
Atawneh, Samer ;
Aljehani, Hamzah .
ELECTRONICS, 2023, 12 (20)
[7]   New filtering approaches for phishing email [J].
Bergholz, Andre ;
De Beer, Jan ;
Glahn, Sebastian ;
Moens, Marie-Francine ;
Paass, Gerhard ;
Strobel, Siehyun .
JOURNAL OF COMPUTER SECURITY, 2010, 18 (01) :7-35
[8]   DT-SNE: t-SNE discrete visualizations as decision tree structures [J].
Bibal, Adrien ;
Delchevalerie, Valentin ;
Frenay, Benoit .
NEUROCOMPUTING, 2023, 529 :101-112
[9]  
Bibi H., 2024, J COMPUTER SCI, V20, P1069, DOI [10.3844/JCSSP.2024.1069.1079, DOI 10.3844/JCSSP.2024.1069.1079]
[10]   Stacked-Based Ensemble Machine Learning Model for Positioning Footballer [J].
Buyrukoglu, Selim ;
Savas, Serkan .
ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING, 2023, 48 (02) :1371-1383