Internet Data Analysis Methodology for Cyberterrorism Vocabulary Detection, Combining Techniques of Big Data Analytics, NLP and Semantic Web

被引:10
作者
Castillo-Zuniga, Ivan [1 ,2 ]
Javier Luna-Rosas, Francisco [3 ]
Rodriguez-Martinez, Laura C. [4 ]
Munoz-Arteaga, Jaime [5 ]
Ivan Lopez-Veyna, Jaime [6 ]
Rodriguez-Diaz, Mario A. [3 ]
机构
[1] Inst Tecnol Llano, Aguascalientes, Aguascalientes, Mexico
[2] Inst Tecnol Aguascalientes, Aguascalientes, Aguascalientes, Mexico
[3] TecNM Inst Tecnol Aguascalientes, Aguascalientes, Aguascalientes, Mexico
[4] Tecnol Nacl Mexico IT Aguascalientes, Aguascalientes, Aguascalientes, Mexico
[5] Univ Autonoma Aguascalientes, Aguascalientes, Aguascalientes, Mexico
[6] Inst Tecnol Zacatecas, Zacatecas, Zacatecas, Mexico
关键词
Big Data Analytics; cyberterrorism; Internet Data Analysis; Machine Learning; Natural Language Processing; Parallel Processing; Semantic Web; Text Mining;
D O I
10.4018/IJSWIS.2020010104
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This article presents a methodology for the analysis of data on the Internet, combining techniques of Big Data analytics, NLP and semantic web in order to find knowledge about large amounts of information on the web. To test the effectiveness of the proposed method, webpages about cyberterrorism were analyzed as a case study. The procedure implemented a genetic strategy in parallel, which integrates (Crawler to locate and download information from the web; to retrieve the vocabulary, using techniques of NLP (tokenization, stop word, TF, TFIDF), methods of stemming and synonyms). For the pursuit of knowledge was built a dataset through the description of a linguistic corpus with semantic ontologies, considering the characteristics of cyber-terrorism, which was analyzed with the algorithms, Random Forests (parallel), Boosting, SVM, neural network, K-nn and Bayes. The results reveal a percentage of the 95.62% accuracy in the detection of the vocabulary of cyber-terrorism, which were approved through cross validation, reaching 576% time savings with parallel processing.
引用
收藏
页码:69 / 86
页数:18
相关论文
共 25 条
[1]   Towards a framework for the potential cyber-terrorist threat to critical national infrastructure A quantitative study [J].
Alqahtani, Abdulrahman .
INFORMATION AND COMPUTER SECURITY, 2015, 23 (05) :532-569
[2]  
[Anonymous], 2017, INT C MATH COMP
[3]  
[Anonymous], 2015, P 24 ACM INT C INF K
[4]  
BBC Mundo, 2014, BBC MUNDO
[5]   Prediction of Aggressive Comments in Social Media: an Exploratory Study [J].
Bosque, L. P. D. ;
Garza, S. E. .
IEEE LATIN AMERICA TRANSACTIONS, 2016, 14 (07) :3474-3480
[6]  
Chawda R., 2016, BIG DATA ADV ANAL TO, P1
[7]  
Hothorn T., 2014, EBSCOHOST, V8, P30, DOI [10.5336/biostatic.2016-50382, DOI 10.5336/BIOSTATIC.2016-50382]
[8]  
Joyanes L., 2013, Big Data: Analisis de grandes volumenes de datos en organizaciones
[9]  
Kolajo T, 2017, 2017 CONFERENCE ON INFORMATION COMMUNICATION TECHNOLOGY AND SOCIETY (ICTAS)
[10]  
Krueger T, 2015, J MACH LEARN RES, V16, P1103