Natural Language Processing Model for Automatic Analysis of Cybersecurity-Related Documents

被引:19
作者
Georgescu, Tiberiu-Marian [1 ]
机构
[1] Bucharest Univ Econ Studies, Dept Econ Informat & Cybernet, 6 Piata Romana, Bucharest 010374, Romania
来源
SYMMETRY-BASEL | 2020年 / 12卷 / 03期
关键词
cybersecurity; machine learning; ontologies; named entity recognition; natural language processing; relation extraction;
D O I
10.3390/sym12030354
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
This paper describes the development and implementation of a natural language processing model based on machine learning which performs cognitive analysis for cybersecurity-related documents. A domain ontology was developed using a two-step approach: (1) the symmetry stage and (2) the machine adjustment. The first stage is based on the symmetry between the way humans represent a domain and the way machine learning solutions do. Therefore, the cybersecurity field was initially modeled based on the expertise of cybersecurity professionals. A dictionary of relevant entities was created; the entities were classified into 29 categories and later implemented as classes in a natural language processing model based on machine learning. After running successive performance tests, the ontology was remodeled from 29 to 18 classes. Using the ontology, a natural language processing model based on a supervised learning model was defined. We trained the model using sets of approximately 300,000 words. Remarkably, our model obtained an F1 score of 0.81 for named entity recognition and 0.58 for relation extraction, showing superior results compared to other similar models identified in the literature. Furthermore, in order to be easily used and tested, a web application that integrates our model as the core component was developed.
引用
收藏
页数:19
相关论文
共 50 条
[21]   Natural Language Processing for Sentiment Analysis Techniques and Applications [J].
Subbamma, T. Venkata ;
Mantravadi, Nagesh ;
Shalom, Angel Ruth ;
Tajne, Niket ;
Sreenivasulu, Gopu ;
Goud, Veer Sudheer .
METALLURGICAL & MATERIALS ENGINEERING, 2025, 31 (03) :194-200
[22]   A hadoop based platform for natural language processing of web pages and documents [J].
Nesi, Paolo ;
Pantaleo, Gianni ;
Sanesi, Gianmarco .
JOURNAL OF VISUAL LANGUAGES AND COMPUTING, 2015, 31 :130-138
[23]   Natural Language Processing Pretraining Language Model for Computer Intelligent Recognition Technology [J].
Dong, Jun .
ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2024, 23 (08)
[24]   Development and Validation of a Natural Language Processing Algorithm to Pseudonymize Documents in the Context of a Clinical Data Warehouse [J].
Tannier, Xavier ;
Wajsburt, Perceval ;
Calliger, Alice ;
Dura, Basile ;
Mouchet, Alexandre ;
Hilka, Martin ;
Bey, Romain .
METHODS OF INFORMATION IN MEDICINE, 2024, 63 (01/02) :21-34
[25]   Natural Language Processing for Sentiment Analysis [J].
Chong, Wei Yen ;
Selvaretnam, Bhawani ;
Soon, Lay-Ki .
PROCEEDINGS 2014 4TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE WITH APPLICATIONS IN ENGINEERING AND TECHNOLOGY ICAIET 2014, 2014, :212-217
[26]   Construction of an Assisted Model Based on Natural Language Processing for Automatic Early Diagnosis of Autoimmune Encephalitis [J].
Zhao, Yunsong ;
Ren, Bin ;
Yu, Wenjin ;
Zhang, Haijun ;
Zhao, Di ;
Lv, Junchao ;
Xie, Zhen ;
Jiang, Kun ;
Shang, Lei ;
Yao, Han ;
Xu, Yongyong ;
Zhao, Gang .
NEUROLOGY AND THERAPY, 2022, 11 (03) :1117-1134
[27]   Construction of an Assisted Model Based on Natural Language Processing for Automatic Early Diagnosis of Autoimmune Encephalitis [J].
Yunsong Zhao ;
Bin Ren ;
Wenjin Yu ;
Haijun Zhang ;
Di Zhao ;
Junchao Lv ;
Zhen Xie ;
Kun Jiang ;
Lei Shang ;
Han Yao ;
Yongyong Xu ;
Gang Zhao .
Neurology and Therapy, 2022, 11 :1117-1134
[28]   Natural Language Processing Model for Managing Maintenance Requests in Buildings [J].
Bouabdallaoui, Yassine ;
Lafhaj, Zoubeir ;
Yim, Pascal ;
Ducoulombier, Laure ;
Bennadji, Belkacem .
BUILDINGS, 2020, 10 (09)
[29]   Grey Relational Analysis and Natural Language Processing to: Grey Language Processing [J].
Khuman, Arjab Singh ;
Yang, Yingjie ;
Liu, Sifeng .
JOURNAL OF GREY SYSTEM, 2016, 28 (01) :88-97
[30]   USING NATURAL LANGUAGE PROCESSING FOR AUTOMATIC EXTRACTION OF ONTOLOGY INSTANCES [J].
Faria, Carla ;
Girardi, Rosario ;
Serra, Ivo ;
Macedo, Maria ;
Maranhao, Djefferson .
ICEIS 2010: PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON ENTERPRISE INFORMATION SYSTEMS, VOL 2: ARTIFICIAL INTELLIGENCE AND DECISION SUPPORT SYSTEMS, 2010, :278-283