A new hybrid ensemble feature selection framework for machine learning-based phishing detection system

被引:192
作者
Chiew, Kang Leng [1 ]
Tan, Choon Lin [1 ]
Wong, KokSheik [2 ]
Yong, Kelvin S. C. [3 ]
Tiong, Wei King [1 ]
机构
[1] Univ Malaysia Sarawak, Fac Comp Sci & Informat Technol, Kota Samarahan 94300, Sarawak, Malaysia
[2] Monash Univ Malaysia, Sch Informat Technol, Bandar Sunway 47500, Selangor, Malaysia
[3] Curtin Univ, Dept Elect & Comp Engn, Fac Engn & Sci, CDT 250, Miri 98009, Sarawak, Malaysia
关键词
Phishing detection; Feature selection; Machine learning; Ensemble-based; Classification; Phishing dataset; CLASSIFICATION;
D O I
10.1016/j.ins.2019.01.064
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper proposes a new feature selection framework for machine learning-based phishing detection system, called the Hybrid Ensemble Feature Selection (HEFS). In the first phase of HEFS, a novel Cumulative Distribution Function gradient (CDF-g) algorithm is exploited to produce primary feature subsets, which are then fed into a data perturbation ensemble to yield secondary feature subsets. The second phase derives a set of baseline features from the secondary feature subsets by using a function perturbation ensemble. The overall experimental results suggest that HEFS performs best when it is integrated with Random Forest classifier, where the baseline features correctly distinguish 94.6% of phishing and legitimate websites using only 20.8% of the original features. In another experiment, the baseline features (10 in total) utilised on Random Forest outperforms the set of all features (48 in total) used on SVM, Naive Bayes, C4.5, JRip, and PART classifiers. HEFS also shows promising results when benchmarked using another well-known phishing dataset from the University of California Irvine (UCI) repository. Hence, the HEFS is a highly desirable and practical feature selection technique for machine learning-based phishing detection systems. (C) 2019 Elsevier Inc. All rights reserved.
引用
收藏
页码:153 / 166
页数:14
相关论文
共 50 条
[41]   A Machine Learning-Based Wrapper Method for Feature Selection [J].
Patel, Damodar ;
Saxena, Amit ;
Wang, John .
INTERNATIONAL JOURNAL OF DATA WAREHOUSING AND MINING, 2024, 20 (01)
[42]   An Improved Machine Learning-Based Employees Attrition Prediction Framework with Emphasis on Feature Selection [J].
Najafi-Zangeneh, Saeed ;
Shams-Gharneh, Naser ;
Arjomandi-Nezhad, Ali ;
Zolfani, Sarfaraz Hashemkhani .
MATHEMATICS, 2021, 9 (11)
[43]   Incorporating Feature Selection Methods into Machine Learning-Based Covid-19 Diagnosis [J].
Danaci, Cagla ;
Tuncer, Seda Arslan .
APPLIED COMPUTER SYSTEMS, 2022, 27 (01) :13-18
[44]   Enhanced Feature Selection Using Genetic Algorithm for Machine-Learning-Based Phishing URL Detection [J].
Kocyigit, Emre ;
Korkmaz, Mehmet ;
Sahingoz, Ozgur Koray ;
Diri, Banu .
APPLIED SCIENCES-BASEL, 2024, 14 (14)
[45]   Phishing Website Detection: An Improved Accuracy through Feature Selection and Ensemble Learning [J].
Ubing, Alyssa Anne ;
Jasmi, Syukrina Kamilia Binti ;
Abdullah, Azween ;
Jhanjhi, N. Z. ;
Supramaniam, Mahadevan .
INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2019, 10 (01) :252-257
[46]   New feature Selection method based on neural network and machine learning [J].
Challita, Nicole ;
Khalil, Mohamad ;
Beauseroy, Pierre .
2016 IEEE INTERNATIONAL MULTIDISCIPLINARY CONFERENCE ON ENGINEERING TECHNOLOGY (IMCET), 2016, :81-84
[47]   A machine learning-based approach for smart agriculture via stacking-based ensemble learning and feature selection methods [J].
Ben Abdallah, Emna ;
Grati, Rima ;
Boukadi, Khouloud .
2022 18TH INTERNATIONAL CONFERENCE ON INTELLIGENT ENVIRONMENTS (IE), 2022,
[48]   Hybrid Feature Selection for Phishing Email Detection [J].
Hamid, Isredza Rahmi A. ;
Abawajy, Jemal .
ALGORITHMS AND ARCHITECTURES FOR PARALLEL PROCESSING, PT II, 2011, 7017 :266-275
[49]   Comparison of Multiple Feature Selection Techniques for Machine Learning-Based Detection of IoT Attacks [J].
Viet Anh Phan ;
Jerabek, Jan ;
Malina, Lukas .
19TH INTERNATIONAL CONFERENCE ON AVAILABILITY, RELIABILITY, AND SECURITY, ARES 2024, 2024,
[50]   Online Phishing Detection: A Heuristic-Based Machine Learning Framework [J].
Elgharbi, Salah Eddine ;
Yahia, Messaoud Ait ;
Ouchani, Samir .
2024 13TH MEDITERRANEAN CONFERENCE ON EMBEDDED COMPUTING, MECO 2024, 2024, :302-305