Phishing Detection using RDF and Random Forests

被引:0
作者
Muppavarapu, Vamsee [1 ]
Rajendran, Archanaa [1 ]
Vasudevan, Shriram [1 ]
机构
[1] Amrita Vishwa Vidyapeetham Univ, Dept Comp Sci & Engn, Coimbatore, Tamil Nadu, India
关键词
Phishing; ensemble learning; RDF models; phishing target; metadata; vocabulary; random forests;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Phishing is one of the major threats in this internet era. Phishing is a smart process where a legitimate website is cloned and victims are lured to the fake website to provide their personal as well as confidential information, sometimes it proves to be costly. Though most of the websites will give a disclaimer warning to the users about phishing, users tend to neglect it. It is not a fully responsible action by the websites also and there is not much that the websites could really do about it. Since phishing has been in persistence for a long time, many approaches have been proposed in past that can detect phishing websites but very few or none of them detect the target websites for these phishing attacks, accurately. Our proposed method is novel and an extension to our previous work, where we identify phishing websites using a combined approach by constructing Resource Description Framework (RDF) models and using ensemble learning algorithms for the classification of websites. Our approach uses supervised learning techniques to train our system. This approach has a promising true positive rate of 98.8%, which is definitely appreciable. As we have used random forest classifier that can handle missing values in dataset, we were able to reduce the false positive rate of the system to an extent of 1.5%. As our system explores the strength of RDF and ensemble learning methods and both these approaches work hand in hand, a highly promising accuracy rate of 98.68% is achieved.
引用
收藏
页码:817 / 824
页数:8
相关论文
共 16 条
[1]  
Alkhateeb F, 2012, INT J SECUR APPL, V6, P53
[2]  
[Anonymous], 2007, ACM P 16 INT C WORLD, DOI DOI 10.1145/1242572.1242660
[3]  
Carroll J., 2001, 293 HPL
[4]  
Chou N., 2004, 11 ANN NETW DISTR SY
[5]   The Google similarity distance [J].
Cilibrasi, Rudi L. ;
Vitanyi, Paul M. B. .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2007, 19 (03) :370-383
[6]  
Jena Apache, FREE OPEN SOURCE JAV
[7]  
Kremic E, 2016, INT ARAB J INF TECHN, V13, P287
[8]  
Muppavarapu V., 2014, INT J SOFTWARE WEB S, V1, P1
[9]   Anomaly based web phishing page detection [J].
Pan, Ying ;
Ding, Xuhua .
22ND ANNUAL COMPUTER SECURITY APPLICATIONS CONFERENCE, PROCEEDINGS, 2006, :381-+
[10]  
Prakash P., 2010, P IEEE INFOCOM, P1, DOI [DOI 10.1109/INFCOM.2010.5462216, 10.1109/INFCOM.2010.5462216, DOI 10.1109/INFCOM.2010.5462216.S]