Partition-then-Overlap Method for Labeling Cyber Threat Intelligence Reports by Topics over Time

被引:2
作者
Nagasawa, Ryusei [1 ]
Furumoto, Keisuke [2 ]
Takita, Makoto [3 ]
Shiraishi, Yoshiaki [1 ,4 ]
Takahashi, Takeshi [2 ]
Mohri, Masami [5 ]
Takano, Yasuhiro [1 ]
Morii, Masakatu [1 ]
机构
[1] Kobe Univ, Dept Elect & Elect Engn, Kobe, Hyogo 6578501, Japan
[2] Natl Inst Informat & Commun Technol, Koganei, Tokyo 1848795, Japan
[3] Univ Hyogo, Sch Social Informat Sci, Kobe, Hyogo 6512197, Japan
[4] Kobe Univ, Ctr Math & Data Sci, Kobe, Hyogo 6578501, Japan
[5] Gifu Univ, Dept Elect Elect & Comp Engn, Gifu 5011193, Japan
关键词
topic model; cyber threat intelligence; text mining; multi-labeling; security blog posts; EXTRACTION;
D O I
10.1587/transinf.2020DAL0002
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The Topics over Time (TOT) model allows users to be aware of changes in certain topics over time. The proposed method inputs the divided dataset of security blog posts based on a fixed period using an overlap period to the TOT. The results suggest the extraction of topics that include malware and attack campaign names that are appropriate for the multi-labeling of cyber threat intelligence reports.
引用
收藏
页码:556 / 561
页数:6
相关论文
共 9 条
[1]  
[Anonymous], 2010, TEXT MINING APPL THE
[2]   Latent Dirichlet allocation [J].
Blei, DM ;
Ng, AY ;
Jordan, MI .
JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (4-5) :993-1022
[3]  
Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171
[4]   Unsupervised named-entity extraction from the Web: An experimental study [J].
Etzioni, O ;
Cafarella, M ;
Downey, D ;
Popescu, AM ;
Shaked, T ;
Soderland, S ;
Weld, DS ;
Yates, A .
ARTIFICIAL INTELLIGENCE, 2005, 165 (01) :91-134
[5]   ETM: Entity Topic Models for Mining Documents Associated with Entities [J].
Kim, Hyungsul ;
Sun, Yizhou ;
Hockenmaier, Julia ;
Han, Jiawei .
12TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM 2012), 2012, :349-358
[6]  
Mihalcea R., 2004, C EMPIRICAL METHODS, P404
[7]   Keyword extraction for blogs based on content richness [J].
Park, Jinhee ;
Kim, Jaekwang ;
Lee, Jee-Hyong .
JOURNAL OF INFORMATION SCIENCE, 2014, 40 (01) :38-49
[8]  
Wang Xuerui., P 12 ACM SIGKDD INT, P424
[9]  
Witten I. H., 1999, Digital 99 Libraries. Fourth ACM Conference on Digital Libraries, P254, DOI 10.1145/313238.313437