A new hybrid ensemble feature selection framework for machine learning-based phishing detection system

被引:192
作者
Chiew, Kang Leng [1 ]
Tan, Choon Lin [1 ]
Wong, KokSheik [2 ]
Yong, Kelvin S. C. [3 ]
Tiong, Wei King [1 ]
机构
[1] Univ Malaysia Sarawak, Fac Comp Sci & Informat Technol, Kota Samarahan 94300, Sarawak, Malaysia
[2] Monash Univ Malaysia, Sch Informat Technol, Bandar Sunway 47500, Selangor, Malaysia
[3] Curtin Univ, Dept Elect & Comp Engn, Fac Engn & Sci, CDT 250, Miri 98009, Sarawak, Malaysia
关键词
Phishing detection; Feature selection; Machine learning; Ensemble-based; Classification; Phishing dataset; CLASSIFICATION;
D O I
10.1016/j.ins.2019.01.064
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper proposes a new feature selection framework for machine learning-based phishing detection system, called the Hybrid Ensemble Feature Selection (HEFS). In the first phase of HEFS, a novel Cumulative Distribution Function gradient (CDF-g) algorithm is exploited to produce primary feature subsets, which are then fed into a data perturbation ensemble to yield secondary feature subsets. The second phase derives a set of baseline features from the secondary feature subsets by using a function perturbation ensemble. The overall experimental results suggest that HEFS performs best when it is integrated with Random Forest classifier, where the baseline features correctly distinguish 94.6% of phishing and legitimate websites using only 20.8% of the original features. In another experiment, the baseline features (10 in total) utilised on Random Forest outperforms the set of all features (48 in total) used on SVM, Naive Bayes, C4.5, JRip, and PART classifiers. HEFS also shows promising results when benchmarked using another well-known phishing dataset from the University of California Irvine (UCI) repository. Hence, the HEFS is a highly desirable and practical feature selection technique for machine learning-based phishing detection systems. (C) 2019 Elsevier Inc. All rights reserved.
引用
收藏
页码:153 / 166
页数:14
相关论文
共 50 条
[21]   Evaluating Machine Learning-Based Feature Selection Methods for Diagnosing Parkinson's Disease Under the SVM Framework [J].
Thirapanish, Wiput ;
Kantavat, Pittipol ;
Wanvarie, Dittaya ;
Chuangsuwanich, Ekapol ;
Punyabukkana, Proadpran .
2024 7TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND BIG DATA, ICAIBD 2024, 2024, :409-415
[22]   A Machine Learning-Based Framework with Enhanced Feature Selection and Resampling for Improved Intrusion Detection [J].
Malik, Fazila ;
Khan, Qazi Waqas ;
Rizwan, Atif ;
Alnashwan, Rana ;
Atteia, Ghada .
MATHEMATICS, 2024, 12 (12)
[23]   Ensemble Learning-Based Wine Quality Prediction Using Optimized Feature Selection and XGBoost [J].
Tyagi, Sonam ;
Rajput, Ishwari Singh ;
Kumar, Bhawnesh ;
Negi, Harendra Singh .
INTERNATIONAL JOURNAL OF MATHEMATICAL ENGINEERING AND MANAGEMENT SCIENCES, 2025, 10 (05) :1621-1639
[24]   A feature selection-driven machine learning framework for anomaly-based intrusion detection systems [J].
Emirmahmutoglu, Emre ;
Atay, Yilmaz .
PEER-TO-PEER NETWORKING AND APPLICATIONS, 2025, 18 (03)
[25]   Machine Learning-Based Feature Selection and Classification for the Experimental Diagnosis of Trypanosoma cruzi [J].
Hevia-Montiel, Nidiyare ;
Perez-Gonzalez, Jorge ;
Neme, Antonio ;
Haro, Paulina .
ELECTRONICS, 2022, 11 (05)
[26]   Modeling Hybrid Feature-Based Phishing Websites Detection Using Machine Learning Techniques [J].
Das Guptta S. ;
Shahriar K.T. ;
Alqahtani H. ;
Alsalman D. ;
Sarker I.H. .
Annals of Data Science, 2024, 11 (01) :217-242
[27]   Automatic Feature Extraction and Selection For Machine Learning Based Intrusion Detection [J].
Liu, Jinjie ;
Chung, Sun Sunnie .
2019 IEEE SMARTWORLD, UBIQUITOUS INTELLIGENCE & COMPUTING, ADVANCED & TRUSTED COMPUTING, SCALABLE COMPUTING & COMMUNICATIONS, CLOUD & BIG DATA COMPUTING, INTERNET OF PEOPLE AND SMART CITY INNOVATION (SMARTWORLD/SCALCOM/UIC/ATC/CBDCOM/IOP/SCI 2019), 2019, :1400-1405
[28]   Network Intrusion Detection with Two-Phased Hybrid Ensemble Learning and Automatic Feature Selection [J].
Mananayaka, Asanka Kavinda ;
Chung, Sun Sunnie .
IEEE ACCESS, 2023, 11 :45154-45167
[29]   Detection of Phishing Websites from URLs Using Hybrid Ensemble-Based Machine Learning Technique [J].
Agagu, Modupe ;
Ogunbiyi, Ibrahin Abayomi ;
Lasisi, Ayodele ;
Omorogiuwa, Osaremwinda .
RECENT ADVANCES ON SOFT COMPUTING AND DATA MINING, SCDM 2024, 2024, 1078 :11-22
[30]   An Effective Feature Selection Algorithm for Machine Learning-based Malicious Traffic Detection [J].
Fei, Chao ;
Xia, Nian ;
Tsai, Pang-Wei ;
Lu, Yang ;
Pan, Xiaonan ;
Gong, Junli .
2024 19TH ASIA JOINT CONFERENCE ON INFORMATION SECURITY, ASIAJCIS 2024, 2024, :91-98