Automated Extraction of Software Names from Vulnerability Reports using LSTM and Expert System

被引:0
|
作者
Khokhlov, Igor [1 ]
Okutan, Ahmet [2 ]
Bryla, Ryan [2 ]
Simmons, Steven [2 ]
Mirakhorli, Mehdi [2 ]
机构
[1] Sacred Heart Univ, Fairfield, CT 06825 USA
[2] Rochester Inst Technol, Rochester, MN USA
来源
2022 IEEE 29TH ANNUAL SOFTWARE TECHNOLOGY CONFERENCE (STC 2022) | 2022年
关键词
Common Product Enumeration; Common Vulnerability; and Exposures; Natural Language Processing; Software Product Name Extraction; Software Vulnerability;
D O I
10.1109/STC55697.2022.00024
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Software vulnerabilities are closely monitored by the security community to timely address the security and privacy issues in software systems. Before a vulnerability is published by vulnerability management systems, it needs to be characterized to highlight its unique attributes, including affected software products and versions, to help security professionals prioritize their patches. Associating product names and versions with disclosed vulnerabilities may require a labor-intensive process that may delay their publication and fix, and thereby give attackers more time to exploit them. This work proposes a machine learning method to extract software product names and versions from unstructured CVE descriptions automatically. It uses Word2Vec and Char2Vec models to create context-aware features from CVE descriptions and uses these features to train a Named Entity Recognition (NER) model using bidirectional Long short-term memory (LSTM) networks. Based on the attributes of the product names and versions in previously published CVE descriptions, we created a set of Expert System (ES) rules to refine the predictions of the NER model and improve the performance of the developed method. Experiment results on real-life CVE examples indicate that using the trained NER model and the set of ES rules, software names and versions in unstructured CVE descriptions could be identified with FMeasure values above 0.95.
引用
收藏
页码:125 / 134
页数:10
相关论文
共 43 条
  • [21] Automated Classification of Free-Text Radiology Reports: Using Different Feature Extraction Methods to Identify Fractures of the Distal Fibula
    Dewald, Cornelia L. A.
    Balandis, Alina
    Becker, Lena S.
    Hinrichs, Jan B.
    von Falck, Christian
    Wacker, Frank K.
    Laser, Hans
    Gerbel, Svetlana
    Winther, Hinrich B.
    Apfel-Starke, Johanna
    ROFO-FORTSCHRITTE AUF DEM GEBIET DER RONTGENSTRAHLEN UND DER BILDGEBENDEN VERFAHREN, 2023, 195 (08): : 713 - 719
  • [22] Automated Keyword Extraction using Support Vector Machine from Arabic News Documents
    Armouty, Batool
    Tedmori, Sara
    2019 IEEE JORDAN INTERNATIONAL JOINT CONFERENCE ON ELECTRICAL ENGINEERING AND INFORMATION TECHNOLOGY (JEEIT), 2019, : 342 - 346
  • [23] CCheXR-Attention: Clinical concept extraction and chest x-ray reports classification using modified Mogrifier and bidirectional LSTM with multihead attention
    Rani, Somiya
    Jain, Amita
    Kumar, Akshi
    Yang, Guang
    INTERNATIONAL JOURNAL OF IMAGING SYSTEMS AND TECHNOLOGY, 2024, 34 (01)
  • [24] Fiscal data in text: Information extraction from audit reports using Natural Language Processing
    Beltran, Alejandro
    DATA & POLICY, 2023, 5
  • [25] Automatic Extraction of Major Osteoporotic Fractures from Radiology Reports using Natural Language Processing
    Wang, Yanshan
    Mehrabi, Saeed
    Sohn, Sunghwan
    Atkinson, Elizabeth
    Amin, Shreyasee
    Liu, Hongfang
    2018 IEEE INTERNATIONAL CONFERENCE ON HEALTHCARE INFORMATICS WORKSHOPS (ICHI-W), 2018, : 64 - 65
  • [26] Implementation and evaluation of a negation tagger in a pipeline-based system for information extraction from pathology reports
    Mitchell, KJ
    Becich, MJ
    Berman, JJ
    Chapman, WW
    Gilbertson, J
    Gupta, D
    Harrison, J
    Legowski, E
    Crowley, RS
    MEDINFO 2004: PROCEEDINGS OF THE 11TH WORLD CONGRESS ON MEDICAL INFORMATICS, PT 1 AND 2, 2004, 107 : 663 - 667
  • [27] Anatomic stage extraction from medical reports of breast Cancer patients using natural language processing
    Deshmukh, Pratiksha R.
    Phalnikar, Rashmi
    HEALTH AND TECHNOLOGY, 2020, 10 (06) : 1555 - 1570
  • [28] Automatic spatiotemporal and semantic information extraction from unstructured geoscience reports using text mining techniques
    Qiu, Qinjun
    Xie, Zhong
    Wu, Liang
    Tao, Liufeng
    EARTH SCIENCE INFORMATICS, 2020, 13 (04) : 1393 - 1410
  • [29] Anatomic stage extraction from medical reports of breast Cancer patients using natural language processing
    Pratiksha R. Deshmukh
    Rashmi Phalnikar
    Health and Technology, 2020, 10 : 1555 - 1570
  • [30] Automatic spatiotemporal and semantic information extraction from unstructured geoscience reports using text mining techniques
    Qinjun Qiu
    Zhong Xie
    Liang Wu
    Liufeng Tao
    Earth Science Informatics, 2020, 13 : 1393 - 1410