A new hybrid ensemble feature selection framework for machine learning-based phishing detection system

被引:176
作者
Chiew, Kang Leng [1 ]
Tan, Choon Lin [1 ]
Wong, KokSheik [2 ]
Yong, Kelvin S. C. [3 ]
Tiong, Wei King [1 ]
机构
[1] Univ Malaysia Sarawak, Fac Comp Sci & Informat Technol, Kota Samarahan 94300, Sarawak, Malaysia
[2] Monash Univ Malaysia, Sch Informat Technol, Bandar Sunway 47500, Selangor, Malaysia
[3] Curtin Univ, Dept Elect & Comp Engn, Fac Engn & Sci, CDT 250, Miri 98009, Sarawak, Malaysia
关键词
Phishing detection; Feature selection; Machine learning; Ensemble-based; Classification; Phishing dataset; CLASSIFICATION;
D O I
10.1016/j.ins.2019.01.064
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper proposes a new feature selection framework for machine learning-based phishing detection system, called the Hybrid Ensemble Feature Selection (HEFS). In the first phase of HEFS, a novel Cumulative Distribution Function gradient (CDF-g) algorithm is exploited to produce primary feature subsets, which are then fed into a data perturbation ensemble to yield secondary feature subsets. The second phase derives a set of baseline features from the secondary feature subsets by using a function perturbation ensemble. The overall experimental results suggest that HEFS performs best when it is integrated with Random Forest classifier, where the baseline features correctly distinguish 94.6% of phishing and legitimate websites using only 20.8% of the original features. In another experiment, the baseline features (10 in total) utilised on Random Forest outperforms the set of all features (48 in total) used on SVM, Naive Bayes, C4.5, JRip, and PART classifiers. HEFS also shows promising results when benchmarked using another well-known phishing dataset from the University of California Irvine (UCI) repository. Hence, the HEFS is a highly desirable and practical feature selection technique for machine learning-based phishing detection systems. (C) 2019 Elsevier Inc. All rights reserved.
引用
收藏
页码:153 / 166
页数:14
相关论文
共 50 条
  • [21] Evaluating Machine Learning-Based Feature Selection Methods for Diagnosing Parkinson's Disease Under the SVM Framework
    Thirapanish, Wiput
    Kantavat, Pittipol
    Wanvarie, Dittaya
    Chuangsuwanich, Ekapol
    Punyabukkana, Proadpran
    2024 7TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND BIG DATA, ICAIBD 2024, 2024, : 409 - 415
  • [22] Machine Learning-Based Feature Selection and Classification for the Experimental Diagnosis of Trypanosoma cruzi
    Hevia-Montiel, Nidiyare
    Perez-Gonzalez, Jorge
    Neme, Antonio
    Haro, Paulina
    ELECTRONICS, 2022, 11 (05)
  • [23] A feature selection-driven machine learning framework for anomaly-based intrusion detection systems
    Emirmahmutoglu, Emre
    Atay, Yilmaz
    PEER-TO-PEER NETWORKING AND APPLICATIONS, 2025, 18 (03)
  • [24] Modeling Hybrid Feature-Based Phishing Websites Detection Using Machine Learning Techniques
    Das Guptta S.
    Shahriar K.T.
    Alqahtani H.
    Alsalman D.
    Sarker I.H.
    Annals of Data Science, 2024, 11 (01) : 217 - 242
  • [25] Automatic Feature Extraction and Selection For Machine Learning Based Intrusion Detection
    Liu, Jinjie
    Chung, Sun Sunnie
    2019 IEEE SMARTWORLD, UBIQUITOUS INTELLIGENCE & COMPUTING, ADVANCED & TRUSTED COMPUTING, SCALABLE COMPUTING & COMMUNICATIONS, CLOUD & BIG DATA COMPUTING, INTERNET OF PEOPLE AND SMART CITY INNOVATION (SMARTWORLD/SCALCOM/UIC/ATC/CBDCOM/IOP/SCI 2019), 2019, : 1400 - 1405
  • [26] An Effective Feature Selection Algorithm for Machine Learning-based Malicious Traffic Detection
    Fei, Chao
    Xia, Nian
    Tsai, Pang-Wei
    Lu, Yang
    Pan, Xiaonan
    Gong, Junli
    2024 19TH ASIA JOINT CONFERENCE ON INFORMATION SECURITY, ASIAJCIS 2024, 2024, : 91 - 98
  • [27] Network Intrusion Detection with Two-Phased Hybrid Ensemble Learning and Automatic Feature Selection
    Mananayaka, Asanka Kavinda
    Chung, Sun Sunnie
    IEEE ACCESS, 2023, 11 : 45154 - 45167
  • [28] Reviewing various feature selection techniques in machine learning-based botnet detection
    Baruah, Sangita
    Borah, Dhruba Jyoti
    Deka, Vaskar
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2024, 36 (12)
  • [29] Machine Learning-Based Cardiovascular Disease Detection Using Optimal Feature Selection
    Ullah, Tahseen
    Ullah, Syed Irfan
    Ullah, Khalil
    Ishaq, Muhammad
    Khan, Ahmad
    Ghadi, Yazeed Yasin
    Algarni, Abdulmohsen
    IEEE ACCESS, 2024, 12 : 16431 - 16446
  • [30] Enhancing intrusion detection in IoT networks using machine learning-based feature selection and ensemble models
    Almotairi, Ayoob
    Atawneh, Samer
    Khashan, Osama A.
    Khafajah, Nour M.
    SYSTEMS SCIENCE & CONTROL ENGINEERING, 2024, 12 (01)