Ontology-Driven Data Semantics Discovery for Cyber-Security

被引:8
作者
Balduccini, Marcello [1 ]
Kushner, Sarah [1 ]
Speck, Jacquelin [1 ]
机构
[1] Drexel Univ, Coll Comp & Informat, Philadelphia, PA 19104 USA
来源
PRACTICAL ASPECTS OF DECLARATIVE LANGUAGES, PADL 2015 | 2015年 / 9131卷
关键词
Data semantics discovery; Ontologies; Machine learning; Cyber-security;
D O I
10.1007/978-3-319-19686-2_1
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
We present an architecture for data semantics discovery capable of extracting semantically-rich content from human-readable files without prior specification of the file format. The architecture, based on work at the intersection of knowledge representation and machine learning, includes machine learning modules for automatic file format identification, tokenization, and entity identification. The process is driven by an ontology of domain-specific concepts. The ontology also provides an abstraction layer for querying the extracted data. We provide a general description of the architecture as well as details of the current implementation. Although the architecture can be applied in a variety of domains, we focus on cyber-forensics applications, aiming to allow one to parse data sources, such as log files, for which there are no readily-available parsing and analysis tools, and to aggregate and query data from multiple, diverse systems across large networks. The key contributions of our work are: the development of an architecture that constitutes a substantial step toward solving a highly-practical open problem; the creation of one of the first comprehensive ontologies of cyber assets; the development and demonstration of an innovative, non-trivial combination of declarative knowledge specification and machine learning.
引用
收藏
页码:1 / 16
页数:16
相关论文
共 15 条
[1]  
Alspaugh S., 2014, C LARG INST SYST ADM
[2]  
[Anonymous], 2011, USENIX SEC S
[3]  
Bartoli A., 2012, P 14 ANN C COMP GEN
[4]  
Bitincka L., 2010, P 2010 WORKSH MAN SY
[5]  
Cui W., 2008, P 15 ACM C COMP COMM
[6]  
Doan AH, 2001, SIGMOD REC, V30, P509
[7]  
Fisher K., 2008, Proceedings of the 2008 ACM SIGMOD international conference on Management of data, P1299
[8]  
Fisher K., 2011, P 14 INT C DAT THEOR
[9]   We Need More Than One Why students need a sophisticated understanding of programming languages [J].
Fisher, Kathleen .
ACM SIGPLAN NOTICES, 2008, 43 (11) :62-65
[10]  
Hangal S., 2011, SEAVIEW USING FINE G