PhishRepo: A Seamless Collection of Phishing Data to Fill a Research Gap in the Phishing Domain

被引:0
作者
Ariyadasa, Subhash [1 ,3 ]
Fernando, Shantha [2 ]
Fernando, Subha [1 ]
机构
[1] Univ Moratuwa, Dept Computat Math, Moratuwa, Sri Lanka
[2] Univ Moratuwa, Dept Comp Sci & Engn, Moratuwa, Sri Lanka
[3] Uva Wellassa Univ, Dept Comp Sci & Informat, Badulla, Sri Lanka
关键词
Cyberattack; crowdsourcing; internet security; phishing; machine learning; multi-modal data; FEATURES; MODEL; URL;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Machine learning-based anti-phishing solutions face various challenges in collecting diverse multi-modal phishing data. As a result, most previous works have trained with little or no multi-modal data, which opens several drawbacks. Therefore, this study aims to develop a phishing data repository to meet the diverse data needs of the anti-phishing domain. As a result, a gap-filling solution named PhishRepo was proposed as an online data repository that collects, verifies, disseminates, and archives phishing data. It includes innovative design aspects such as automated submission, deduplication filtering, automated verification, crowdsourcing-based human interaction, an objec-tion reporting window, and target attack prevention techniques. Moreover, the deduplication filter, used for the first time in phishing data collection, significantly impacted the collection process. It eliminated the duplicate data, which causes one of the most common machine learning errors known as data leakage. In addition, PhishRepo enables researchers to apply modern machine learning techniques effectively and supports them by eliminating phishing data hassle. Therefore, more thoughtful use of PhishRepo will lead to effective anti-phishing solutions in the future, minimising the social engineering crime called phishing.
引用
收藏
页码:850 / 865
页数:16
相关论文
共 55 条
[1]   Phishing Attacks: A Recent Comprehensive Study and a New Anatomy [J].
Alkhalil, Zainab ;
Hewage, Chaminda ;
Nawaf, Liqaa ;
Khan, Imtiaz .
FRONTIERS IN COMPUTER SCIENCE, 2021, 3
[2]  
[Anonymous], 2011, NOKIA IDEASPROJECT W
[3]  
[Anonymous], 2010, Implementation and benchmarking of perceptual image hash functions
[4]  
[Anonymous], 2017, INT C INF SYST CRIS
[5]  
APWG, 2021, PHISH ACT TRENDS REP, P13
[6]   Detecting phishing attacks using a combined model of LSTM and CNN [J].
Ariyadasa, Subhash ;
Fernando, Subha ;
Fernando, Shantha .
INTERNATIONAL JOURNAL OF ADVANCED AND APPLIED SCIENCES, 2020, 7 (07) :56-67
[7]  
Ariyadasa Subhash, 2022, Phishrepo-dataset
[8]  
Ariyadasa Subhash, 2021, Phishing Websites Dataset
[9]  
Baslyman Malak, 2016, 2016 APWG S EL CRIM, P1, DOI [10.1109/ECRIME.2016.7487946, DOI 10.1109/ECRIME.2016.7487946]
[10]  
Beck K., 2010, Proceedings of the 2010 IEEE Second International Conference on Social Computing (SocialCom 2010). the Second IEEE International Conference on Privacy, Security, Risk and Trust (PASSAT 2010), P649, DOI 10.1109/SocialCom.2010.100