A keyword-based combination approach for detecting phishing webpages

被引:37
作者
Ding, Yan [1 ]
Luktarhan, Nurbol [1 ]
Li, Keqin [2 ]
Slamu, Wushour [1 ]
机构
[1] Xinjiang Univ, Coll Informat Sci & Engn, Urumqi, Peoples R China
[2] SUNY Coll New Paltz, Dept Comp Sci, New Paltz, NY USA
基金
中国博士后科学基金;
关键词
Heuristic rule; Machine learning; Phishing; Search engine; URL obfuscation techniques;
D O I
10.1016/j.cose.2019.03.018
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, the Search & Heuristic Rule & Logistic Regression (SHLR) combination detection method is proposed for detecting the obfuscation techniques commonly used by phishing websites and improving the filtering efficiency of legitimate webpages. The method is composed of three steps. First, the title tag content of the webpage is input as search keywords to the Baidu search engine, and the webpage is considered legal if the webpage domain matches the domain name of any of the top-10 search results; otherwise, further evaluation is performed. Second, if the webpage cannot be identified as legal, then the webpage is further examined to determine whether it is a phishing page based on the heuristic rules defined by the character features. The first two steps can quickly filter webpages to meet the needs of real-time detection. Finally, a logistic regression classifier is used to assess the remaining pages to enhance the adaptability and accuracy of the detection method. The experimental results show that the SHLR can filter 61.9% of legitimate webpages and identify 22.9% of phishing webpages based on uniform/universal resource locator (URL) lexical information. The accuracy of the SHLR is 98.9%; thus, its phishing detection performance is high. (C) 2019 Elsevier Ltd. All rights reserved.
引用
收藏
页码:256 / 275
页数:20
相关论文
共 51 条
[11]   Utilisation of website logo for phishing detection [J].
Chiew, Kang Leng ;
Chang, Ee Hung ;
Sze, San Nah ;
Tiong, Wei King .
COMPUTERS & SECURITY, 2015, 54 :16-26
[12]  
Choi H, 2011, TMS2011 SUPPLEMENTAL PROCEEDINGS, VOL 3: GENERAL PAPER SELECTIONS, P117
[13]  
Choi Y, 2010, INT J SECUR APPL, V4, P13
[14]  
COX DR, 1958, J R STAT SOC B, V20, P215
[15]  
Dunlop Matthew, 2010, Proceedings of the Fifth International Conference on Internet Monitoring and Protection (ICIMP 2010), P123, DOI 10.1109/ICIMP.2010.24
[16]  
Fan R-E., J MACH LEARN RES
[17]  
Gastellier-Prevost S., 2011, Network and Information Systems Security (SAR-SSI), 2011 Conference on, P1, DOI DOI 10.1109/SAR-SSI.2011.5931389
[18]   A comprehensive and efficacious architecture for detecting phishing webpages [J].
Gowtham, R. ;
Krishnamurthi, Ilango .
COMPUTERS & SECURITY, 2014, 40 :23-37
[19]   A new fast associative classification algorithm for detecting phishing websites [J].
Hadi, Wa'el ;
Aburub, Faisal ;
Alhawari, Samer .
APPLIED SOFT COMPUTING, 2016, 48 :729-734
[20]   Malicious web content detection by machine learning [J].
Hou, Yung-Tsung ;
Chang, Yimeng ;
Chen, Tsuhan ;
Laih, Chi-Sung ;
Chen, Chia-Mei .
EXPERT SYSTEMS WITH APPLICATIONS, 2010, 37 (01) :55-60