ContextMiner: Mining Contextual Features for Conceptualizing Knowledge in Security Texts

被引：1

作者：

Gutierrez, Luis Felipe ^{[1
]}

Namin, Akbar ^{[1
]}

机构：

[1] Texas Tech Univ, Dept Comp Sci, Lubbock, TX 79409 USA

来源：

IEEE ACCESS | 2022年 / 10卷

基金：

美国国家科学基金会;

关键词：

Feature extraction; Computer security; Data mining; Syntactics; Natural language processing; Machine learning; Tagging; Dependency parsing; feature extraction; machine learning; natural language processing; word embeddings;

D O I：

10.1109/ACCESS.2022.3198944

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

This paper presents ContextMiner, a novel natural language processing (NLP) framework to automatically capture contextual features for the purpose of extracting meaningful context-aware phrases from cybersecurity unstructured textual data. The framework utilizes basic attributes such as part-of-speech tagging, dependency parsing, and a domain-specific grammar to extract the contextual features. The effectiveness and applications of ContextMiner are evaluated and presented from two different perspectives: qualitative and quantitative. As for the qualitative analysis, our case studies show that the proposed framework is capable of retrieving additional contents from the given texts, both in a labeled and unlabeled setting, and thus building context-aware phrases in comparison with existing approaches. From a quantitative point of view, we evaluate ContextMiner as a pre-processing step to perform named entity recognition (NER). Our results show that ContextMiner reduces the corpus up to 70% while maintaining 85% of its relevant entities, with a small drop in the classification metrics. Finally, we explored the utilization of ContextMiner in the construction and reasoning of knowledge graphs.

引用

页码：85891 / 85904

页数：14

共 35 条

[1] Bridges RA, 2014, Arxiv, DOI arXiv:1308.4941
[2] An enhanced technique of skin cancer classification using deep convolutional neural network with transfer learning models
Ali, Md Shahin
Miah, Md Sipon
Haque, Jahurul
Rahman, Md Mahbubur
Islam, Md Khairul
[J]. MACHINE LEARNING WITH APPLICATIONS, 2021, 5
[3] Buber E, 2017, 2017 INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND ENGINEERING (UBMK), P337, DOI 10.1109/UBMK.2017.8093406
[4] An Approach to Data Reduction and Integrated Machine Classification
Czarnowski, Ireneusz
Jedrzejowicz, Piotr
[J]. NEW GENERATION COMPUTING, 2010, 28 (01) : 21 - 40
[5] De Marneffe M.-C., 2008, STANFORD TYPED DEPEN
[6] Dependency graph for short text extraction and summarization
Franciscus, Nigel
Ren, Xuguang
Stantic, Bela
[J]. JOURNAL OF INFORMATION AND TELECOMMUNICATION, 2019, 3 (04) : 413 - 429
[7] Gamallo Pablo., 2012, Proceedings of the Joint Workshop on Unsupervised and Semi-Supervised Learning in NLP, P10
[8] Golczynski A, 2021, Arxiv, DOI arXiv:2108.12276
[9] Graves A, 2012, STUD COMPUT INTELL, V385, P1, DOI [10.1007/978-3-642-24797-2, 10.1162/neco.1997.9.1.1]
[10] Email Embeddings for Phishing Detection
Gutierrez, Luis Felipe
Abri, Faranak
Armstrong, Miriam
Namin, Akbar Siami
Jones, Keith S.
[J]. 2020 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2020, : 2087 - 2092

← 1 2 3 4 →