Natural Language Processing Model for Automatic Analysis of Cybersecurity-Related Documents

被引：16

作者：

Georgescu, Tiberiu-Marian ^{[1
]}

机构：

[1] Bucharest Univ Econ Studies, Dept Econ Informat & Cybernet, 6 Piata Romana, Bucharest 010374, Romania

来源：

SYMMETRY-BASEL | 2020年 / 12卷 / 03期

关键词：

cybersecurity; machine learning; ontologies; named entity recognition; natural language processing; relation extraction;

D O I：

10.3390/sym12030354

中图分类号：

O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];

学科分类号：

07 ; 0710 ; 09 ;

摘要：

This paper describes the development and implementation of a natural language processing model based on machine learning which performs cognitive analysis for cybersecurity-related documents. A domain ontology was developed using a two-step approach: (1) the symmetry stage and (2) the machine adjustment. The first stage is based on the symmetry between the way humans represent a domain and the way machine learning solutions do. Therefore, the cybersecurity field was initially modeled based on the expertise of cybersecurity professionals. A dictionary of relevant entities was created; the entities were classified into 29 categories and later implemented as classes in a natural language processing model based on machine learning. After running successive performance tests, the ontology was remodeled from 29 to 18 classes. Using the ontology, a natural language processing model based on a supervised learning model was defined. We trained the model using sets of approximately 300,000 words. Remarkably, our model obtained an F1 score of 0.81 for named entity recognition and 0.58 for relation extraction, showing superior results compared to other similar models identified in the literature. Furthermore, in order to be easily used and tested, a web application that integrates our model as the core component was developed.

引用

页数：19

共 50 条

[1] Automatic Classification of Documents in a Natural Language: A Conceptual Model
Lyfenko, N. D.
AUTOMATIC DOCUMENTATION AND MATHEMATICAL LINGUISTICS, 2014, 48 (03) : 158 - 166
[2] Automatic classification of documents in a natural language: A conceptual model
N. D. Lyfenko
Automatic Documentation and Mathematical Linguistics, 2014, 48 (3) : 158 - 166
[3] GarNLP: A Natural Language Processing Pipeline for Garnishment Documents
Ilaria Bordino
Andrea Ferretti
Francesco Gullo
Stefano Pascolutti
Information Systems Frontiers, 2021, 23 : 101 - 114
[4] GarNLP: A Natural Language Processing Pipeline for Garnishment Documents
Bordino, Ilaria
Ferretti, Andrea
Gullo, Francesco
Pascolutti, Stefano
INFORMATION SYSTEMS FRONTIERS, 2021, 23 (01) : 101 - 114
[5] A Natural Language Processing Survey on Legislative and Greek Documents
Krasadakis, Panteleimon
Sakkopoulos, Evangelos
Verykios, Vassilios S.
25TH PAN-HELLENIC CONFERENCE ON INFORMATICS WITH INTERNATIONAL PARTICIPATION (PCI2021), 2021, : 407 - 412
[6] Bayesian Analysis in Natural Language Processing
Cohen S.
Synthesis Lectures on Human Language Technologies, 2016, 9 (02): : 1 - 276
[7] Automatic Text Summarization in Natural Language Processing
Desai, M. R.
Gachhinakatti, Bhagyashree
Balaganur, Pooja
Rajeshwari, Y.
Rathod, Laxmi
2021 IEEE INTERNATIONAL CONFERENCE ON MOBILE NETWORKS AND WIRELESS COMMUNICATIONS (ICMNWC), 2021,
[8] Natural Language Processing Workflow for Customer Request Analysis in a Company
Smirnov, Alexander
Teslya, Nikolay
Shilov, Nikolay
Frank, Diethard
Weidig, Dirk
Minina, Elena
Evers, Kathrin
IFAC PAPERSONLINE, 2021, 54 (01): : 1206 - 1211
[9] Automatic Generation of a Business Process Model Diagram Based on Natural Language Processing
Moesslang, Madline
Bernsteiner, Reinhard
Ploder, Christian
Schloegl, Stephan
KNOWLEDGE MANAGEMENT IN ORGANISATIONS, KMO 2024, 2024, 2152 : 237 - 247
[10] Detecting Semantic Similarity Of Documents Using Natural Language Processing
Agarwala, Saurabh
Anagawadi, Aniketh
Guddeti, Ram Mohana Reddy
AI IN COMPUTATIONAL LINGUISTICS, 2021, 189 : 128 - 135

← 1 2 3 4 5 →