Detecting Phishing Attacks from URL by using NLP Techniques

被引:0
作者
Buber, Ebubekir [1 ]
Diri, Banu [1 ]
Sahingoz, Ozgur Koray [2 ]
机构
[1] Yildiz Tekn Univ, Bilgisayar Muhendisligi Bolumu, Istanbul, Turkey
[2] Hava Harp Okulu, Bilgisayar Muhendisligi Bolumu, Istanbul, Turkey
来源
2017 INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND ENGINEERING (UBMK) | 2017年
关键词
Machine Learning; Phishing Attack; NLP; Random Forest Algorithm; Cyber Attack Detection; Cyber Security;
D O I
暂无
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Nowadays, cyber attacks affect many institutions and individuals, and they result in a serious financial loss for them. Phishing Attack is one of the most common types of cyber attacks which is aimed at exploiting people's weaknesses to obtain confidential information about them. This type of cyber attack threats almost all internet users and institutions. To reduce the financial loss caused by this type of attacks, there is a need for awareness of the users as well as applications with the ability to detect them. In the last quarter of 2016, Turkey appears to be second behind China with an impact rate of approximately 43% in the Phishing Attack Analysis report between 45 countries. In this study, firstly, the characteristics of this type of attack are explained, and then a machine learning based system is proposed to detect them. In the proposed system, some features were extracted by using Natural Language Processing (NLP) techniques. The system was implemented by examining URLs used in Phishing Attacks before opening them with using some extracted features. Many tests have been applied to the created system, and it is seen that the best algorithm among the tested ones is the Random Forest algorithm with a success rate of 89.9%.
引用
收藏
页码:337 / 342
页数:6
相关论文
共 13 条
  • [1] Abu-Nimeh S., 2007, P ANT WORK GROUPS 2, P60, DOI DOI 10.1145/1299015.1299021
  • [2] APWG, 2017, Phishing activity trends report, 4th quarter 2017
  • [3] New filtering approaches for phishing email
    Bergholz, Andre
    De Beer, Jan
    Glahn, Sebastian
    Moens, Marie-Francine
    Paass, Gerhard
    Strobel, Siehyun
    [J]. JOURNAL OF COMPUTER SECURITY, 2010, 18 (01) : 7 - 35
  • [4] Cao Y., 2008, DIM '08, P51
  • [5] Cook DL, 2008, LECT NOTES COMPUT SC, V5143, P182, DOI 10.1007/978-3-540-85230-8_15
  • [6] Detecting phishing web pages with visual similarity assessment based on Earth Mover's Distance (EMD)
    Fu, Anthony Y.
    Wenyin, Liu
    Deng, Xiaotie
    [J]. IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, 2006, 3 (04) : 301 - 311
  • [7] Garera S, 2007, WORM'07: PROCEEDINGS OF THE 2007 ACM WORKSHOP ON RECURRING MALCODE, P1
  • [8] Phishing Detection: A Literature Survey
    Khonji, Mahmoud
    Iraqi, Youssef
    Jones, Andrew
    [J]. IEEE COMMUNICATIONS SURVEYS AND TUTORIALS, 2013, 15 (04): : 2091 - 2121
  • [9] Marchal Samuel, 2012, Research in Attacks, Intrusions, and Defenses. Proceedings of the 15th International Symposium, RAID 2012, P190, DOI 10.1007/978-3-642-33338-5_10
  • [10] Natural-language processing for intrusion detection
    Stone, Allen
    [J]. COMPUTER, 2007, 40 (12) : 103 - 105