A New Ensemble Model for Phishing Detection Based on Hybrid Cumulative Feature Selection

被引:5
作者
Prince, Md Sirajum Munir [1 ]
Hasan, Asib [1 ]
Shah, Faisal Muhammad [1 ]
机构
[1] Ahsanullah Univ Sci & Technol, Dept Comp Sci & Engn, Dhaka, Bangladesh
来源
11TH IEEE SYMPOSIUM ON COMPUTER APPLICATIONS & INDUSTRIAL ELECTRONICS (ISCAIE 2021) | 2021年
关键词
Phishing detection; Feature selection; Machine learning; Ensemble classification; Majority voting;
D O I
10.1109/ISCAIE51753.2021.9431782
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A New Ensemble Model for Phishing Detection Based on Hybrid Cumulative Feature Selection(PDCFS) proposes a model that partitions the main dataset into n partitions based on the features available in the dataset. This process is done by feeding the dataset into multiple feature selection methods: Chi-Square, Gain Ratio, Information Gain, Pearson Correlation Coefficient, and Principal Components Analysis, and arranging the dataset into n datasets taking top-n features including the class label, given by the filter method, and discarding the remain- ing ones each time. Then each partitions of the dataset are used for training and testing using 5-fold cross validation, applying a number of classifiers: Support Vector Machine, Naive Bayes, C4.5, Random Forest, JRip, PART, and k-Nearest Neighbors. In the next, the results are voted to get the best possible result. Majority voting is applied on both reduced feature subsets gained through feature selection steps, and full feature set to draw a comparison between the feature sets followed. The overall speculation suggests that Random Forest with reduced feature set of 32 tops the result chart with and accuracy of 98.36 %, while proposed PDCFS scores 98.24 % of accuracy. PDCFS also demonstrates a comparative performance when compared with other hybrid models.
引用
收藏
页码:7 / 12
页数:6
相关论文
共 27 条
  • [1] CBR-PDS: a case-based reasoning phishing detection system
    Abutair, Hassan
    Belghith, Abdelfettah
    AlAhmadi, Saad
    [J]. JOURNAL OF AMBIENT INTELLIGENCE AND HUMANIZED COMPUTING, 2019, 10 (07) : 2593 - 2606
  • [2] [Anonymous], P 2008 C EMAIL ANT
  • [3] [Anonymous], 2007, P ANT WORK GROUPS 2, DOI DOI 10.1145/1299015.1299021
  • [4] APWG, Phishing activity trends report, 2nd quarter 2022
  • [5] Feature selection using an improved Chi-square for Arabic text classification
    Bahassine, Said
    Madani, Abdellah
    Al-Sarem, Mohammed
    Kissi, Mohamed
    [J]. JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES, 2020, 32 (02) : 225 - 231
  • [6] Benesty J, 2009, SPRINGER TOP SIGN PR, V2, P1, DOI 10.1007/978-3-642-00296-0_1
  • [7] Improving the Prediction Accuracy of Decision Tree Mining with Data Preprocessing
    Chandrasekar, Priyanga
    Qian, Kai
    Shahriar, Hossain
    Bhattacharya, Prabir
    [J]. 2017 IEEE 41ST ANNUAL COMPUTER SOFTWARE AND APPLICATIONS CONFERENCE (COMPSAC), VOL 2, 2017, : 481 - 484
  • [8] A new hybrid ensemble feature selection framework for machine learning-based phishing detection system
    Chiew, Kang Leng
    Tan, Choon Lin
    Wong, KokSheik
    Yong, Kelvin S. C.
    Tiong, Wei King
    [J]. INFORMATION SCIENCES, 2019, 484 : 153 - 166
  • [9] A new fast associative classification algorithm for detecting phishing websites
    Hadi, Wa'el
    Aburub, Faisal
    Alhawari, Samer
    [J]. APPLIED SOFT COMPUTING, 2016, 48 : 729 - 734
  • [10] Huang H., 2005, IMPROVED KNN ALGORIT, V3801, P392