Automated Extraction of Software Names from Vulnerability Reports using LSTM and Expert System

被引：0

作者：

Khokhlov, Igor ^{[1
]}

Okutan, Ahmet ^{[2
]}

Bryla, Ryan ^{[2
]}

Simmons, Steven ^{[2
]}

Mirakhorli, Mehdi ^{[2
]}

机构：

[1] Sacred Heart Univ, Fairfield, CT 06825 USA

[2] Rochester Inst Technol, Rochester, MN USA

来源：

2022 IEEE 29TH ANNUAL SOFTWARE TECHNOLOGY CONFERENCE (STC 2022) | 2022年

关键词：

Common Product Enumeration; Common Vulnerability; and Exposures; Natural Language Processing; Software Product Name Extraction; Software Vulnerability;

D O I：

10.1109/STC55697.2022.00024

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

Software vulnerabilities are closely monitored by the security community to timely address the security and privacy issues in software systems. Before a vulnerability is published by vulnerability management systems, it needs to be characterized to highlight its unique attributes, including affected software products and versions, to help security professionals prioritize their patches. Associating product names and versions with disclosed vulnerabilities may require a labor-intensive process that may delay their publication and fix, and thereby give attackers more time to exploit them. This work proposes a machine learning method to extract software product names and versions from unstructured CVE descriptions automatically. It uses Word2Vec and Char2Vec models to create context-aware features from CVE descriptions and uses these features to train a Named Entity Recognition (NER) model using bidirectional Long short-term memory (LSTM) networks. Based on the attributes of the product names and versions in previously published CVE descriptions, we created a set of Expert System (ES) rules to refine the predictions of the NER model and improve the performance of the developed method. Experiment results on real-life CVE examples indicate that using the trained NER model and the set of ES rules, software names and versions in unstructured CVE descriptions could be identified with FMeasure values above 0.95.

引用

页码：125 / 134

页数：10

共 43 条

[41] Enhanced PIELG: A Protein Interaction Extraction System using a Link Grammar Parser from biomedical abstracts
Seoud, R. A. Abul
Kadah, Y. M.
2008 CAIRO INTERNATIONAL BIOMEDICAL ENGINEERING CONFERENCE, 2008, : 31 - +
[42] Adverse Event extraction from Structured Product Labels using the Event-based Text-mining of Health Electronic Records (ETHER) system
Pandey, Abhishek
Kreimeyer, Kory
Foster, Matthew
Oanh Dang
Ly, Thomas
Wang, Wei
Forshee, Richard
Botsis, Taxiarchis
HEALTH INFORMATICS JOURNAL, 2019, 25 (04) : 1232 - 1243
[43] Using natural language processing to extract structured epilepsy data from unstructured clinic letters: development and validation of the ExECT (extraction of epilepsy clinical text) system
Fonferko-Shadrach, Beata
Lacey, Arron S.
Roberts, Angus
Akbari, Ashley
Thompson, Simon
Ford, David V.
Lyons, Ronan A.
Rees, Mark I.
Pickrell, William Owen
BMJ OPEN, 2019, 9 (04):

← 1 2 3 4 5 →