PDRCNN: Precise Phishing Detection with Recurrent Convolutional Neural Networks

被引:41
作者
Wang, Weiping [1 ]
Zhang, Feng [1 ]
Luo, Xi [2 ,3 ]
Zhang, Shigeng [1 ,4 ]
机构
[1] Cent South Univ, Sch Comp Sci & Engn, Changsha, Hunan, Peoples R China
[2] Hunan Police Acad, Hunan Prov Key Lab Network Invest Technol, Changsha, Hunan, Peoples R China
[3] Hunan Police Acad, Dept Informat Technol, Changsha, Hunan, Peoples R China
[4] Chinese Acad Sci, State Key Lab Informat Secur, Inst Informat Engn, Beijing 100093, Peoples R China
基金
中国国家自然科学基金;
关键词
WEBSITES; FRAMEWORK; WEBPAGES; MODEL;
D O I
10.1155/2019/2595794
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Through well-designed counterfeit websites, phishing induces online users to visit forged web pages to obtain their private sensitive information, e.g., account number and password. Existing antiphishing approaches are mostly based on page-related features, which require to crawl content of web pages as well as accessing third-party search engines or DNS services. This not only leads to their low efficiency in detecting phishing but also makes them rely on network environment and third-party services heavily. In this paper, we propose a fast phishing website detection approach called PDRCNN that relies only on the URL of the website. PDRCNN neither needs to retrieve content of the target website nor uses any third-party services as previous approaches do. It encodes the information of an URL into a two-dimensional tensor and feeds the tensor into a novelly designed deep learning neural network to classify the original URL. We first use a bidirectional LSTM network to extract global features of the constructed tensor and give all string information to each character in the URL. After that, we use a CNN to automatically judge which characters play key roles in phishing detection, capture the key components of the URL, and compress the extracted features into a fixed length vector space. By combining the two types of networks, PDRCNN achieves better performance than just using either one of them. We built a dataset containing nearly 500,000 URLs which are obtained through Alexa and PhishTank. Experimental results show that PDRCNN achieves a detection accuracy of 97% and an AUC value of 99%, which is much better than state-of-the-art approaches. Furthermore, the recognition process is very fast: on the trained PDRCNN model, the average per URL detection time only cost 0.4 ms.
引用
收藏
页数:15
相关论文
共 33 条
[1]   Phishing detection based Associative Classification data mining [J].
Abdelhamid, Neda ;
Ayesh, Aladdin ;
Thabtah, Fadi .
EXPERT SYSTEMS WITH APPLICATIONS, 2014, 41 (13) :5948-5959
[2]   Intelligent phishing detection system for e-banking using fuzzy data mining [J].
Aburrous, Maher ;
Hossain, M. A. ;
Dahal, Keshav ;
Thabtah, Fadi .
EXPERT SYSTEMS WITH APPLICATIONS, 2010, 37 (12) :7913-7921
[3]  
[Anonymous], 2005, SPEC INT TRACKS 14 I
[4]   Heuristic nonlinear regression strategy for detecting phishing websites [J].
Babagoli, Mehdi ;
Aghababa, Mohammad Pourmahmood ;
Solouk, Vahid .
SOFT COMPUTING, 2019, 23 (12) :4315-4327
[5]  
Cao Y., 2008, DIM '08, P51, DOI DOI 10.1145/1456424.1456434
[6]  
Chen Wenwu., 2018, Data Science, P638, DOI [DOI 10.1007/978-981-13-2203-752, 10.1007/978-981-13-2203-752, DOI 10.1007/978-981-13-2203-7_52, 10.1007/978-981-13-2203-7_52]
[7]   A new hybrid ensemble feature selection framework for machine learning-based phishing detection system [J].
Chiew, Kang Leng ;
Tan, Choon Lin ;
Wong, KokSheik ;
Yong, Kelvin S. C. ;
Tiong, Wei King .
INFORMATION SCIENCES, 2019, 484 :153-166
[8]   Leverage Website Favicon to Detect Phishing Websites [J].
Chiew, Kang Leng ;
Choo, Jeffrey Soon-Fatt ;
Sze, San Nah ;
Yong, Kelvin S. C. .
SECURITY AND COMMUNICATION NETWORKS, 2018,
[9]  
Bahnsen AC, 2017, PROCEEDINGS OF THE 2017 APWG SYMPOSIUM ON ELECTRONIC CRIME RESEARCH (ECRIME), P1, DOI 10.1109/ECRIME.2017.7945048
[10]  
Google, 2019, GOOGL SAF BROWS API