A New Ensemble Model for Phishing Detection Based on Hybrid Cumulative Feature Selection

被引：5

作者：

Prince, Md Sirajum Munir ^{[1
]}

Hasan, Asib ^{[1
]}

Shah, Faisal Muhammad ^{[1
]}

机构：

[1] Ahsanullah Univ Sci & Technol, Dept Comp Sci & Engn, Dhaka, Bangladesh

来源：

11TH IEEE SYMPOSIUM ON COMPUTER APPLICATIONS & INDUSTRIAL ELECTRONICS (ISCAIE 2021) | 2021年

关键词：

Phishing detection; Feature selection; Machine learning; Ensemble classification; Majority voting;

D O I：

10.1109/ISCAIE51753.2021.9431782

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

A New Ensemble Model for Phishing Detection Based on Hybrid Cumulative Feature Selection(PDCFS) proposes a model that partitions the main dataset into n partitions based on the features available in the dataset. This process is done by feeding the dataset into multiple feature selection methods: Chi-Square, Gain Ratio, Information Gain, Pearson Correlation Coefficient, and Principal Components Analysis, and arranging the dataset into n datasets taking top-n features including the class label, given by the filter method, and discarding the remain- ing ones each time. Then each partitions of the dataset are used for training and testing using 5-fold cross validation, applying a number of classifiers: Support Vector Machine, Naive Bayes, C4.5, Random Forest, JRip, PART, and k-Nearest Neighbors. In the next, the results are voted to get the best possible result. Majority voting is applied on both reduced feature subsets gained through feature selection steps, and full feature set to draw a comparison between the feature sets followed. The overall speculation suggests that Random Forest with reduced feature set of 32 tops the result chart with and accuracy of 98.36 %, while proposed PDCFS scores 98.24 % of accuracy. PDCFS also demonstrates a comparative performance when compared with other hybrid models.

引用

页码：7 / 12

页数：6

共 27 条

[1] CBR-PDS: a case-based reasoning phishing detection system
Abutair, Hassan
Belghith, Abdelfettah
AlAhmadi, Saad
[J]. JOURNAL OF AMBIENT INTELLIGENCE AND HUMANIZED COMPUTING, 2019, 10 (07) : 2593 - 2606
[2] [Anonymous], P 2008 C EMAIL ANT
[3] [Anonymous], 2007, P ANT WORK GROUPS 2, DOI DOI 10.1145/1299015.1299021
[4] APWG, Phishing activity trends report, 2nd quarter 2022
[5] Feature selection using an improved Chi-square for Arabic text classification
Bahassine, Said
Madani, Abdellah
Al-Sarem, Mohammed
Kissi, Mohamed
[J]. JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES, 2020, 32 (02) : 225 - 231
[6] Benesty J, 2009, SPRINGER TOP SIGN PR, V2, P1, DOI 10.1007/978-3-642-00296-0_1
[7] Improving the Prediction Accuracy of Decision Tree Mining with Data Preprocessing
Chandrasekar, Priyanga
Qian, Kai
Shahriar, Hossain
Bhattacharya, Prabir
[J]. 2017 IEEE 41ST ANNUAL COMPUTER SOFTWARE AND APPLICATIONS CONFERENCE (COMPSAC), VOL 2, 2017, : 481 - 484
[8] A new hybrid ensemble feature selection framework for machine learning-based phishing detection system
Chiew, Kang Leng
Tan, Choon Lin
Wong, KokSheik
Yong, Kelvin S. C.
Tiong, Wei King
[J]. INFORMATION SCIENCES, 2019, 484 : 153 - 166
[9] A new fast associative classification algorithm for detecting phishing websites
Hadi, Wa'el
Aburub, Faisal
Alhawari, Samer
[J]. APPLIED SOFT COMPUTING, 2016, 48 : 729 - 734
[10] Huang H., 2005, IMPROVED KNN ALGORIT, V3801, P392

← 1 2 3 →