A new hybrid ensemble feature selection framework for machine learning-based phishing detection system

被引:176
|
作者
Chiew, Kang Leng [1 ]
Tan, Choon Lin [1 ]
Wong, KokSheik [2 ]
Yong, Kelvin S. C. [3 ]
Tiong, Wei King [1 ]
机构
[1] Univ Malaysia Sarawak, Fac Comp Sci & Informat Technol, Kota Samarahan 94300, Sarawak, Malaysia
[2] Monash Univ Malaysia, Sch Informat Technol, Bandar Sunway 47500, Selangor, Malaysia
[3] Curtin Univ, Dept Elect & Comp Engn, Fac Engn & Sci, CDT 250, Miri 98009, Sarawak, Malaysia
关键词
Phishing detection; Feature selection; Machine learning; Ensemble-based; Classification; Phishing dataset; CLASSIFICATION;
D O I
10.1016/j.ins.2019.01.064
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper proposes a new feature selection framework for machine learning-based phishing detection system, called the Hybrid Ensemble Feature Selection (HEFS). In the first phase of HEFS, a novel Cumulative Distribution Function gradient (CDF-g) algorithm is exploited to produce primary feature subsets, which are then fed into a data perturbation ensemble to yield secondary feature subsets. The second phase derives a set of baseline features from the secondary feature subsets by using a function perturbation ensemble. The overall experimental results suggest that HEFS performs best when it is integrated with Random Forest classifier, where the baseline features correctly distinguish 94.6% of phishing and legitimate websites using only 20.8% of the original features. In another experiment, the baseline features (10 in total) utilised on Random Forest outperforms the set of all features (48 in total) used on SVM, Naive Bayes, C4.5, JRip, and PART classifiers. HEFS also shows promising results when benchmarked using another well-known phishing dataset from the University of California Irvine (UCI) repository. Hence, the HEFS is a highly desirable and practical feature selection technique for machine learning-based phishing detection systems. (C) 2019 Elsevier Inc. All rights reserved.
引用
收藏
页码:153 / 166
页数:14
相关论文
共 50 条
  • [1] A New Ensemble Model for Phishing Detection Based on Hybrid Cumulative Feature Selection
    Prince, Md Sirajum Munir
    Hasan, Asib
    Shah, Faisal Muhammad
    11TH IEEE SYMPOSIUM ON COMPUTER APPLICATIONS & INDUSTRIAL ELECTRONICS (ISCAIE 2021), 2021, : 7 - 12
  • [2] Phishing detection based on machine learning and feature selection methods
    Almseidin M.
    Abu Zuraiq A.M.
    Al-kasassbeh M.
    Alnidami N.
    International Journal of Interactive Mobile Technologies, 2019, 13 (12) : 71 - 183
  • [3] The hybrid framework of ensemble technique in machine learning for phishing detection
    Mahajan, Akanksha S.
    Navale, Pradnya K.
    Patil, Vaishnavi V.
    Khadse, Vijay M.
    Mahalle, Parikshit N.
    INTERNATIONAL JOURNAL OF INFORMATION AND COMPUTER SECURITY, 2023, 21 (1-2) : 162 - 184
  • [4] Feature Selection Approach for Phishing Detection Based on Machine Learning
    Wei, Yi
    Sekiya, Yuji
    PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON APPLIED CYBER SECURITY (ACS) 2021, 2022, 378 : 61 - 70
  • [5] Feature Selection For Machine Learning-Based Early Detection of Distributed Cyber Attacks
    Feng, Yaokai
    Akiyama, Hitoshi
    Lu, Liang
    Sakurai, Kouichi
    2018 16TH IEEE INT CONF ON DEPENDABLE, AUTONOM AND SECURE COMP, 16TH IEEE INT CONF ON PERVAS INTELLIGENCE AND COMP, 4TH IEEE INT CONF ON BIG DATA INTELLIGENCE AND COMP, 3RD IEEE CYBER SCI AND TECHNOL CONGRESS (DASC/PICOM/DATACOM/CYBERSCITECH), 2018, : 173 - 180
  • [6] Feature selection in single and ensemble learning-based bankruptcy prediction models
    Lin, Wei-Chao
    Lu, Yu-Hsin
    Tsai, Chih-Fong
    EXPERT SYSTEMS, 2019, 36 (01)
  • [7] Enhancing Phishing Detection: A Machine Learning Approach With Feature Selection and Deep Learning Models
    Nayak, Ganesh S.
    Muniyal, Balachandra
    Belavagi, Manjula C.
    IEEE ACCESS, 2025, 13 : 33308 - 33320
  • [8] Phishing Website Detection Using Machine Learning Classifiers Optimized by Feature Selection
    Mehanovic, Dzelila
    Kevric, Jasmin
    TRAITEMENT DU SIGNAL, 2020, 37 (04) : 563 - 569
  • [9] Explainable machine learning for phishing feature detection
    Calzarossa, Maria Carla
    Giudici, Paolo
    Zieni, Rasha
    QUALITY AND RELIABILITY ENGINEERING INTERNATIONAL, 2024, 40 (01) : 362 - 373
  • [10] Machine learning-based intrusion detection: feature selection versus feature extraction
    Ngo, Vu-Duc
    Vuong, Tuan-Cuong
    Van Luong, Thien
    Tran, Hung
    CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2024, 27 (03): : 2365 - 2379