Dark Web Illegal Activities Crawling and Classifying Using Data Mining Techniques

被引:11
作者
Alaidi A.H.M. [1 ]
Al_airaji R.M. [2 ]
ALRikabi H.T.S. [3 ]
Aljazaery I.A. [4 ]
Abbood S.H. [5 ]
机构
[1] College of Computer Science and Information Technology, Wasit University, Wasit
[2] Information Technology College, Babylon University, Babylon
[3] College of Engineering, Wasit University, Wasit
[4] Engineering College, Babylon University, Babylon
[5] Faculty of Engineering, University Technology Malaysia (UTM), Johor
关键词
dark web; Linear Support Vector Classifier; Naïve Bayes;
D O I
10.3991/ijim.v16i10.30209
中图分类号
学科分类号
摘要
Dark web is a canopy concept that denotes any kind of illicit activities carried out by anonymous persons or organizations, thereby making it difficult to trace. The illicit content on the dark web is constantly updated and changed. The collection and classification of such illegal activities are challenging tasks, as they are difficult and time-consuming. This problem has in recent times emerged as an issue that requires quick attention from both the industry and academia. To this end, efforts have been made in this article a crawler that is capable of collecting dark web pages, cleaning them, and saving them in a document database, is proposed. The crawler carries out an automatic classification of the gathered web pages into five classes. The classifiers used in classifying the pages include Linear Support Vector Classifier (SVC), Naïve Bayes (NB), and Document Frequency (TF-IDF). The experimental results revealed that an accuracy rate of 92% and 81% were achieved by SVC and NB, respectively. © 2022
引用
收藏
页码:122 / 139
页数:17
相关论文
共 59 条
[1]  
Ma W., Chen X., Shang W., Advanced deep web crawler based on Dom, 2012 Fifth International Joint Conference on Computational Sciences and Optimization, pp. 605-609, (2012)
[2]  
Noor U., Rashid Z., Rauf A., A survey of automatic deep web classification techniques, International Journal of Computer Applications, 19, 6, pp. 43-50, (2011)
[3]  
Kavallieros D., Myttas D., Kermitsis E., Lissaris E., Giataganas G., Darra E., Understanding the dark web, Dark Web Investigation, pp. 3-26, (2021)
[4]  
Oludayo O. O., Research trends on CAPTCHA: A systematic literature, International Journal of Electrical Computer Engineering, 11, 5, (2021)
[5]  
Moradi M., Keyvanpour M., CAPTCHA and its alternatives: A review, Security Communication Networks, 8, 12, pp. 2135-2156, (2015)
[6]  
ALRikabi H. T., Hazim H. T., Enhanced data security of communication system using combined encryption and steganography, International Journal of Interactive Mobile Technologies, 15, 16, pp. 144-157, (2021)
[7]  
Huang C.-N., Chen H., Denning D., Roberts N. C., Larson C., Yu X., The dark web forum portal: From multi-lingual to video, (2011)
[8]  
Chertoff M., A public policy perspective of the dark web, Journal of Cyber Policy, 2, 1, pp. 26-38, (2017)
[9]  
Weaver N., Paxson V., Staniford S., Cunningham R., Large scale malicious code: A research agenda, (2003)
[10]  
Azeez R. A., Abdul-Hussein M. K., Mahdi M. S., Design a system for an approved video copyright over cloud based on biometric iris and random walk generator using watermark technique, Periodicals of Engineering Natural Sciences, 10, 1, pp. 178-187, (2022)