Automated Extraction of Software Names from Vulnerability Reports using LSTM and Expert System

被引:0
|
作者
Khokhlov, Igor [1 ]
Okutan, Ahmet [2 ]
Bryla, Ryan [2 ]
Simmons, Steven [2 ]
Mirakhorli, Mehdi [2 ]
机构
[1] Sacred Heart Univ, Fairfield, CT 06825 USA
[2] Rochester Inst Technol, Rochester, MN USA
来源
2022 IEEE 29TH ANNUAL SOFTWARE TECHNOLOGY CONFERENCE (STC 2022) | 2022年
关键词
Common Product Enumeration; Common Vulnerability; and Exposures; Natural Language Processing; Software Product Name Extraction; Software Vulnerability;
D O I
10.1109/STC55697.2022.00024
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Software vulnerabilities are closely monitored by the security community to timely address the security and privacy issues in software systems. Before a vulnerability is published by vulnerability management systems, it needs to be characterized to highlight its unique attributes, including affected software products and versions, to help security professionals prioritize their patches. Associating product names and versions with disclosed vulnerabilities may require a labor-intensive process that may delay their publication and fix, and thereby give attackers more time to exploit them. This work proposes a machine learning method to extract software product names and versions from unstructured CVE descriptions automatically. It uses Word2Vec and Char2Vec models to create context-aware features from CVE descriptions and uses these features to train a Named Entity Recognition (NER) model using bidirectional Long short-term memory (LSTM) networks. Based on the attributes of the product names and versions in previously published CVE descriptions, we created a set of Expert System (ES) rules to refine the predictions of the NER model and improve the performance of the developed method. Experiment results on real-life CVE examples indicate that using the trained NER model and the set of ES rules, software names and versions in unstructured CVE descriptions could be identified with FMeasure values above 0.95.
引用
收藏
页码:125 / 134
页数:10
相关论文
共 43 条
  • [31] Using an ensemble system to improve concept extraction from clinical records
    Kang, Ning
    Afzal, Zubair
    Singh, Bharat
    van Mulligen, Erik M.
    Mors, Jan A.
    JOURNAL OF BIOMEDICAL INFORMATICS, 2012, 45 (03) : 423 - 428
  • [32] Automated content analysis for construction safety: A natural language processing system to extract precursors and outcomes from unstructured injury reports
    Tixier, Antoine J. -P.
    Hallowell, Matthew R.
    Rajagopalan, Balaji
    Bowman, Dean
    AUTOMATION IN CONSTRUCTION, 2016, 62 : 45 - 56
  • [33] Automated extraction of information of lung cancer staging from unstructured reports of PET-CT interpretation: natural language processing with deep-learning
    Park, Hyung Jun
    Park, Namu
    Lee, Jang Ho
    Choi, Myeong Geun
    Ryu, Jin-Sook
    Song, Min
    Choi, Chang-Min
    BMC MEDICAL INFORMATICS AND DECISION MAKING, 2022, 22 (01)
  • [34] Improved Identification of Venous Thromboembolism From Electronic Medical Records Using a Novel Information Extraction Software Platform
    Dantes, Raymund B.
    Zheng, Shuai
    Lu, James J.
    Beckman, Michele G.
    Krishnaswamy, Asha
    Richardson, Lisa C.
    Chernetsky-Tejedor, Sheri
    Wang, Fusheng
    MEDICAL CARE, 2018, 56 (09) : E54 - E60
  • [35] Automated extraction of information of lung cancer staging from unstructured reports of PET-CT interpretation: natural language processing with deep-learning
    Hyung Jun Park
    Namu Park
    Jang Ho Lee
    Myeong Geun Choi
    Jin-Sook Ryu
    Min Song
    Chang-Min Choi
    BMC Medical Informatics and Decision Making, 22
  • [36] Automated classification of cancer morphology from Italian pathology reports using Natural Language Processing techniques: A rule-based approach
    Lindaa, Hammami
    Alessia, Paglialonga
    Giancarlo, Pruneri
    Michele, Torresani
    Milenaa, Sant
    Carlo, Bono
    Gianluca, Caiani Enrico
    Paolo, Baili
    JOURNAL OF BIOMEDICAL INFORMATICS, 2021, 116
  • [37] Evaluating automated approaches to anaphylaxis case classification using unstructured data from the FDA Sentinel System
    Ball, Robert
    Toh, Sengwee
    Nolan, Jamie
    Haynes, Kevin
    Forshee, Richard
    Botsis, Taxiarchis
    PHARMACOEPIDEMIOLOGY AND DRUG SAFETY, 2018, 27 (10) : 1077 - 1084
  • [38] Automatic extraction of cancer registry reportable information from free-text pathology reports using multitask convolutional neural networks
    Alawad, Mohammed
    Gao, Shang
    Qiu, John X.
    Yoon, Hong Jun
    Christian, J. Blair
    Penberthy, Lynne
    Mumphrey, Brent
    Wu, Xiao-Cheng
    Coyle, Linda
    Tourassi, Georgia
    JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2020, 27 (01) : 89 - 98
  • [39] Automating Stroke Data Extraction From Free-Text Radiology Reports Using Natural Language Processing: Instrument Validation Study
    Yu, Amy Y. X.
    Liu, Zhongyu A.
    Pou-Prom, Chloe
    Lopes, Kaitlyn
    Kapral, Moira K.
    Aviv, Richard, I
    Mamdani, Muhammad
    JMIR MEDICAL INFORMATICS, 2021, 9 (05)
  • [40] Improving natural language information extraction from cancer pathology reports using transfer learning and zero-shot string similarity
    Park, Briton
    Altieri, Nicholas
    DeNero, John
    Odisho, Anobel Y.
    Yu, Bin
    JAMIA OPEN, 2021, 4 (03)