Efficient computation of comprehensive statistical information of large OWL datasets: a scalable approach

被引:2
|
作者
Mohamed, Heba [1 ,2 ]
Fathalla, Said [1 ,2 ]
Lehmann, Jens [1 ,3 ]
Jabeen, Hajira [4 ]
机构
[1] Univ Bonn, Smart Data Analyt SDA, Bonn, Germany
[2] Univ Alexandria, Fac Sci, Alexandria, Egypt
[3] Fraunhofer IAIS, Dresden Lab, NetMedia Dept, Dresden, Germany
[4] GESIS Leibniz Inst Social Sci, Cologne, Germany
关键词
Distributed processing; in-memory approach; SANSA framework; scalable architecture; Semantic Web; statistics computations; ONTOLOGY; ENTERPRISE; SCALE;
D O I
10.1080/17517575.2022.2062683
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Computing dataset statistics is crucial for exploring their structure, however, it becomes challenging for large-scale datasets. This has several key benefits, such as link target identification, vocabulary reuse, quality analysis, big data analytics, and coverage analysis. In this paper, we present the first attempt of developing a distributed approach (OWLStats) for collecting comprehensive statistics over large-scale OWL datasets. OWLStats is a distributed in-memory approach for computing 50 statistical criteria for OWL datasets utilizing Apache Spark. We have successfully integrated OWLStats into the SANSA framework. Experiments results prove that OWLStats is linearly scalable in terms of both node and data scalability.
引用
收藏
页数:21
相关论文
共 2 条
  • [1] A Distributed Approach for Parsing Large-scale OWL Datasets
    Mohamed, Heba
    Fathalla, Said
    Lehmann, Jens
    Jabeen, Hajira
    PROCEEDINGS OF THE 12TH INTERNATIONAL JOINT CONFERENCE ON KNOWLEDGE DISCOVERY, KNOWLEDGE ENGINEERING AND KNOWLEDGE MANAGEMENT (KEOD), VOL 2, 2020, : 227 - 234
  • [2] DSS: A Scalable and Efficient Stratified Sampling Algorithm for Large-Scale Datasets
    Li, Minne
    Li, Dongsheng
    Shen, Siqi
    Zhang, Zhaoning
    Lu, Xicheng
    NETWORK AND PARALLEL COMPUTING, 2016, 9966 : 133 - 146