Prediction of phishing websites using machine learning

被引:0
作者
Mithilesh Kumar Pandey
Munindra Kumar Singh
Saurabh Pal
B. B. Tiwari
机构
[1] VBS Purvanchal University,Department of Computer Applications
[2] VBS Purvanchal University,Department of Electronics and Communication
来源
Spatial Information Research | 2023年 / 31卷
关键词
Machine learning; Decision tree algorithm; Random forest algorithm; Gradient boosting and phishing websites;
D O I
暂无
中图分类号
学科分类号
摘要
With the growing popularity of the information science, more application is being integrated with websites that can be accessed directly through the internet. This has increased the possibility of attack by ill-legal persons to steal personal information. To identify a phishing assault, several strategies have been presented. However, there is still opportunity for progress in the fight against phishing. The objective of this research paper is to develop a more accurate prediction model using Decision Tree (DT), Random Forest (RF) and Gradient Boosting Classifiers (GBC) with three features selection techniques Extra Tree (ET), Chi-Square and Recursive Feature Elimination (RFE). Since phishing websites dataset contains 89 features, therefore we have applied extra tree and chi-square, feature selection method to identify the limited important features and then recursive features elimination technique has been used to reduce the dataset up-to optimum important features. We have compared the performance of the developed model using machine learning algorithms and find the best prediction performance using GBC, followed by RF and DT. These algorithmic models capture the trends from various cases of phishing with over R-square, Root Mean Square Error (RMSE), and Mean Absolute Error (MAE), in each case.
引用
收藏
页码:157 / 166
页数:9
相关论文
共 47 条
[1]  
Krombholz K(2015)Advanced social engineering attacks Journal of Information Security and applications 22 113-122
[2]  
Hobel H(2017)Using case-based reasoning for phishing detection Procedia Computer Science 109 281-288
[3]  
Huber M(2019)Jail-Phish: An improved search engine based phishing detection system Computers & Security 83 246-267
[4]  
Weippl E(2020)An effective phishing detection model based on character level convolutional neural network from URL Electronics 9 1514-57
[5]  
Abutair HY(2021)A novel approach for phishing URLs detection using lexical based machine learning in a real-time environment Computer Communications 175 47-11
[6]  
Belghith A(2020)Applications of machine learning techniques to predict diagnostic breast cancer SN Computer Science 1 1-95
[7]  
Rao RS(2020)Prediction of thyroid disease using decision tree ensemble method Human-Intelligent Systems Integration 2 89-13
[8]  
Pais AR(2014)Performance analysis of data mining algorithms for diagnosis and prediction of heart and breast cancer disease Review of Research 3 1-21
[9]  
Aljofey A(2022)An ensemble approach for feature selection and classification in intrusion detection using extra-tree algorithm International Journal of Information Security and Privacy (IJISP) 16 1-16
[10]  
Jiang Q(2020)Sequential feature selection and machine learning algorithm-based patient’s death events prediction and diagnosis in heart disease SN Computer Science 1 1-251