DTOF-ANN: An Artificial Neural Network phishing detection model based on Decision Tree and Optimal Features

被引:43
作者
Zhu, Erzhou [1 ]
Ju, Yinyin [1 ]
Chen, Zhile [1 ]
Liu, Feng [1 ]
Fang, Xianyong [1 ]
机构
[1] Anhui Univ, Sch Comp Sci & Technol, Key Lab Intelligent Comp & Signal Proc, Minist Educ, Hefei 230601, Peoples R China
关键词
Phishing detection; Feature selection; Neural network; K-medoids clustering; FEATURE-SELECTION;
D O I
10.1016/j.asoc.2020.106505
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recently, phishing emerges as one of the biggest threats to human's daily networking environments. Phishing attackers disguise illegal URLs as normal ones to steal user's private information with the social engineering techniques, such as emails and SMS, which calls for an effective method of preventing phishing attacks to relieve the loss by them. Neural networks can be used to detect and prevent phishing attacks because of their strong active learning abilities from massive datasets and high accuracy in data classification. However, duplicate points in the public datasets and negative and useless features in the feature vectors will trap the training of the neural networks into the problem of over-fitting, which will make the trained classifier weak when detect phishing websites. This paper proposes DTOF-ANN (Decision Tree and Optimal Features based Artificial Neural Network) to tackle this shortcoming, which is a neural-network phishing detection model based on decision tree and optimal feature selection. First, the traditional K-medoids clustering algorithm is improved with an incremental selection of initial centers to remove the duplicate points from the public datasets. Then, an optimal feature selection algorithm based on the new defined feature evaluation index, decision tree and local search method is designed to prune out the negative and useless features. Finally, the optimal structure of the neural network classifier is constructed through properly adjusting parameters and trained by the selected optimal features. Experimental results have demonstrated that DTOF-ANN exhibits higher performance than many of the existing methods. (C) 2020 Elsevier B.V. All rights reserved.
引用
收藏
页数:14
相关论文
共 35 条
[1]   Phishing detection based Associative Classification data mining [J].
Abdelhamid, Neda ;
Ayesh, Aladdin ;
Thabtah, Fadi .
EXPERT SYSTEMS WITH APPLICATIONS, 2014, 41 (13) :5948-5959
[2]   Why phishing still works: User strategies for combating phishing attacks [J].
Alsharnouby, Mohamed ;
Alaca, Furkan ;
Chiasson, Sonia .
INTERNATIONAL JOURNAL OF HUMAN-COMPUTER STUDIES, 2015, 82 :69-82
[3]  
[Anonymous], 2016, P 2016 IEEE S SERIES, DOI DOI 10.1109/SSCI.2016.7850079
[4]   A new hybrid ensemble feature selection framework for machine learning-based phishing detection system [J].
Chiew, Kang Leng ;
Tan, Choon Lin ;
Wong, KokSheik ;
Yong, Kelvin S. C. ;
Tiong, Wei King .
INFORMATION SCIENCES, 2019, 484 :153-166
[5]   A survey of phishing attacks: Their types, vectors and technical approaches [J].
Chiew, Kang Leng ;
Yong, Kelvin Sheng Chek ;
Tan, Choon Lin .
EXPERT SYSTEMS WITH APPLICATIONS, 2018, 106 :1-20
[6]   Detection of Phishing Websites Based on Probabilistic Neural Networks and K-Medoids Clustering [J].
El-Alfy, El-Sayed M. .
COMPUTER JOURNAL, 2017, 60 (12) :1745-1759
[7]  
Elssied NOF, 2015, SOFT COMPUT, V19, P3237, DOI 10.1007/s00500-014-1479-2
[8]  
Feroz MN, 2014, IEEE INT CONF BIG DA, P241, DOI 10.1109/BigData.2014.7004239
[9]  
Francois Marie Manjari Saint, 2011, INT J COMPUT SCI ISS, V8, P330
[10]   Detecting phishing web pages with visual similarity assessment based on Earth Mover's Distance (EMD) [J].
Fu, Anthony Y. ;
Wenyin, Liu ;
Deng, Xiaotie .
IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, 2006, 3 (04) :301-311