Quality assessment of cyber threat intelligence knowledge graph based on adaptive joining of embedding model

被引:1
作者
Chen, Bin [1 ]
Li, Hongyi [2 ]
Zhao, Di [2 ]
Yang, Yitang [3 ]
Pan, Chengwei [4 ,5 ]
机构
[1] Beihang Univ, Sch Cyber Sci & Technol, Beijing 100191, Peoples R China
[2] Beihang Univ, Sch Math Sci, Beijing 100191, Peoples R China
[3] Beihang Univ, Sch Software, Beijing 100191, Peoples R China
[4] Beihang Univ, Sch Artificial Intelligence, Beijing 100191, Peoples R China
[5] Zhongguancun Lab, Beijing, Peoples R China
基金
中国国家自然科学基金;
关键词
Knowledge graph; Quality assessment; Reinforcement learning; Cyber threat intelligence;
D O I
10.1007/s40747-024-01661-3
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In the research of cyber threat intelligence knowledge graphs, the current challenge is that there are errors, inconsistencies, or missing knowledge graph triples, which makes it difficult to cope with the complexity and diversified application requirements. Currently, the predominant approach in quality assessment research for knowledge graphs involves employing word embeddings. This method evaluates the rationality of triples to assess the quality of knowledge graphs. Recent studies have found that better word representations can be obtained by splicing different types of embeddings, and applied to tasks such as named entity recognition (NER). However, amidst the proliferation of embedding typologies, the conundrum of selecting optimal embeddings for constructing connection representations has emerged as a pressing issue. In this paper, we propose an adaptive joining of embedding (AJE) model to automatically find better word embedding representations for knowledge graph quality assessment. The AJE model operates through a coordinated interplay between a task model and a selector. The former samples word embeddings generated by various models, while the latter generates rewards predicated on feedback obtained from current task outcomes to decide whether or not to splice the embedding. Experiments were conducted on two generic datasets and one cybersecurity dataset for knowledge graph quality assessment. The results show that our model outperforms the baseline model and achieves significant advantages in key metrics such as accuracy and F1 value, obtaining accuracy of 95.8%, 95.6% and 91.3% on the generic datasets WN11, FB13 and cybersecurity dataset CS13K, respectively, representing increases of 1.0%, 0.2% and 0.5% over the AttTucker model.
引用
收藏
页数:14
相关论文
共 33 条
[1]  
Aghaei Ehsan, 2022, Securebert: A domain-specific language model for cybersecurity
[2]   CyBERT: Cybersecurity Claim Classification by Fine-Tuning the BERT Language Model [J].
Ameri, Kimia ;
Hempel, Michael ;
Sharif, Hamid ;
Lopez Jr, Juan ;
Perumalla, Kalyan .
JOURNAL OF CYBERSECURITY AND PRIVACY, 2021, 1 (04) :615-637
[3]  
Amjadian E, 2021, AEROSP CONF PROC, DOI 10.1109/AERO50100.2021.9438369
[4]  
Balazevic I, 2019, 2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019), P5185
[5]  
Bordes A., 2013, P 26 INT C NEURAL IN, P2787
[6]   Trust Evaluation in Online Social Networks Based on Knowledge Graph [J].
Cheng, Xianglong ;
Li, Xiaoyong .
2018 INTERNATIONAL CONFERENCE ON ALGORITHMS, COMPUTING AND ARTIFICIAL INTELLIGENCE (ACAI 2018), 2018,
[7]  
Das SS, 2021, arXiv.2102.11498, DOI 10.48550
[8]  
Devlin J., 2018, arXiv
[9]  
[董聪 Dong Cong], 2020, [信息安全学报, Journal of Cyber Security], V5, P56
[10]   A generative vision model that trains with high data efficiency and breaks text-based CAPTCHAs [J].
George, Dileep ;
Lehrach, Wolfgang ;
Kansky, Ken ;
Lazaro-Gredilla, Miguel ;
Laan, Christopher ;
Marthi, Bhaskara ;
Lou, Xinghua ;
Meng, Zhaoshi ;
Liu, Yi ;
Wang, Huayan ;
Lavin, Alex ;
Phoenix, D. Scott .
SCIENCE, 2017, 358 (6368)