A new hybrid ensemble feature selection framework for machine learning-based phishing detection system

被引:192
作者
Chiew, Kang Leng [1 ]
Tan, Choon Lin [1 ]
Wong, KokSheik [2 ]
Yong, Kelvin S. C. [3 ]
Tiong, Wei King [1 ]
机构
[1] Univ Malaysia Sarawak, Fac Comp Sci & Informat Technol, Kota Samarahan 94300, Sarawak, Malaysia
[2] Monash Univ Malaysia, Sch Informat Technol, Bandar Sunway 47500, Selangor, Malaysia
[3] Curtin Univ, Dept Elect & Comp Engn, Fac Engn & Sci, CDT 250, Miri 98009, Sarawak, Malaysia
关键词
Phishing detection; Feature selection; Machine learning; Ensemble-based; Classification; Phishing dataset; CLASSIFICATION;
D O I
10.1016/j.ins.2019.01.064
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper proposes a new feature selection framework for machine learning-based phishing detection system, called the Hybrid Ensemble Feature Selection (HEFS). In the first phase of HEFS, a novel Cumulative Distribution Function gradient (CDF-g) algorithm is exploited to produce primary feature subsets, which are then fed into a data perturbation ensemble to yield secondary feature subsets. The second phase derives a set of baseline features from the secondary feature subsets by using a function perturbation ensemble. The overall experimental results suggest that HEFS performs best when it is integrated with Random Forest classifier, where the baseline features correctly distinguish 94.6% of phishing and legitimate websites using only 20.8% of the original features. In another experiment, the baseline features (10 in total) utilised on Random Forest outperforms the set of all features (48 in total) used on SVM, Naive Bayes, C4.5, JRip, and PART classifiers. HEFS also shows promising results when benchmarked using another well-known phishing dataset from the University of California Irvine (UCI) repository. Hence, the HEFS is a highly desirable and practical feature selection technique for machine learning-based phishing detection systems. (C) 2019 Elsevier Inc. All rights reserved.
引用
收藏
页码:153 / 166
页数:14
相关论文
共 50 条
[31]   Reviewing various feature selection techniques in machine learning-based botnet detection [J].
Baruah, Sangita ;
Borah, Dhruba Jyoti ;
Deka, Vaskar .
CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2024, 36 (12)
[32]   Machine Learning-Based Cardiovascular Disease Detection Using Optimal Feature Selection [J].
Ullah, Tahseen ;
Ullah, Syed Irfan ;
Ullah, Khalil ;
Ishaq, Muhammad ;
Khan, Ahmad ;
Ghadi, Yazeed Yasin ;
Algarni, Abdulmohsen .
IEEE ACCESS, 2024, 12 :16431-16446
[33]   Enhancing intrusion detection in IoT networks using machine learning-based feature selection and ensemble models [J].
Almotairi, Ayoob ;
Atawneh, Samer ;
Khashan, Osama A. ;
Khafajah, Nour M. .
SYSTEMS SCIENCE & CONTROL ENGINEERING, 2024, 12 (01)
[34]   Recursive Feature Elimination with Cross-Validation with Decision Tree: Feature Selection Method for Machine Learning-Based Intrusion Detection Systems [J].
Awad, Mohammed ;
Fraihat, Salam .
JOURNAL OF SENSOR AND ACTUATOR NETWORKS, 2023, 12 (05)
[35]   Phishing Attacks Detection A Machine Learning-Based Approach [J].
Salahdine, Fatima ;
El Mrabet, Zakaria ;
Kaabouch, Naima .
2021 IEEE 12TH ANNUAL UBIQUITOUS COMPUTING, ELECTRONICS & MOBILE COMMUNICATION CONFERENCE (UEMCON), 2021, :250-255
[36]   Ensemble Learning-Based Feature Selection for Phage Protein Prediction [J].
Liu, Songbo ;
Cui, Chengmin ;
Chen, Huipeng ;
Liu, Tong .
FRONTIERS IN MICROBIOLOGY, 2022, 13
[37]   Phishing Detection System Through Hybrid Machine Learning Based on URL [J].
Karim, Abdul ;
Shahroz, Mobeen ;
Mustofa, Khabib ;
Belhaouari, Samir Brahim ;
Joga, S. Ramana Kumar .
IEEE ACCESS, 2023, 11 :36805-36822
[38]   Machine Learning-Based Ensemble Recursive Feature Selection of Circulating miRNAs for Cancer Tumor Classification [J].
Lopez-Rincon, Alejandro ;
Mendoza-Maldonado, Lucero ;
Martinez-Archundia, Marlet ;
Schonhuth, Alexander ;
Kraneveld, Aletta D. ;
Garssen, Johan ;
Tonda, Alberto .
CANCERS, 2020, 12 (07) :1-27
[39]   An Effective Malware Detection Method Using Hybrid Feature Selection and Machine Learning Algorithms [J].
Dabas, Namita ;
Ahlawat, Prachi ;
Sharma, Prabha .
ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING, 2023, 48 (08) :9749-9767
[40]   Feature selection in machine learning: A new perspective [J].
Cai, Jie ;
Luo, Jiawei ;
Wang, Shulin ;
Yang, Sheng .
NEUROCOMPUTING, 2018, 300 :70-79