Efficient computation of comprehensive statistical information of large OWL datasets: a scalable approach

被引：2

作者：

Mohamed, Heba ^{[1
,2
]}

Fathalla, Said ^{[1
,2
]}

Lehmann, Jens ^{[1
,3
]}

Jabeen, Hajira ^{[4
]}

机构：

[1] Univ Bonn, Smart Data Analyt SDA, Bonn, Germany

[2] Univ Alexandria, Fac Sci, Alexandria, Egypt

[3] Fraunhofer IAIS, Dresden Lab, NetMedia Dept, Dresden, Germany

[4] GESIS Leibniz Inst Social Sci, Cologne, Germany

来源：

ENTERPRISE INFORMATION SYSTEMS | 2023年 / 17卷 / 07期

关键词：

Distributed processing; in-memory approach; SANSA framework; scalable architecture; Semantic Web; statistics computations; ONTOLOGY; ENTERPRISE; SCALE;

D O I：

10.1080/17517575.2022.2062683

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Computing dataset statistics is crucial for exploring their structure, however, it becomes challenging for large-scale datasets. This has several key benefits, such as link target identification, vocabulary reuse, quality analysis, big data analytics, and coverage analysis. In this paper, we present the first attempt of developing a distributed approach (OWLStats) for collecting comprehensive statistics over large-scale OWL datasets. OWLStats is a distributed in-memory approach for computing 50 statistical criteria for OWL datasets utilizing Apache Spark. We have successfully integrated OWLStats into the SANSA framework. Experiments results prove that OWLStats is linearly scalable in terms of both node and data scalability.

引用

页数：21

共 2 条

[1] A Distributed Approach for Parsing Large-scale OWL Datasets
Mohamed, Heba
Fathalla, Said
Lehmann, Jens
Jabeen, Hajira
PROCEEDINGS OF THE 12TH INTERNATIONAL JOINT CONFERENCE ON KNOWLEDGE DISCOVERY, KNOWLEDGE ENGINEERING AND KNOWLEDGE MANAGEMENT (KEOD), VOL 2, 2020, : 227 - 234
[2] DSS: A Scalable and Efficient Stratified Sampling Algorithm for Large-Scale Datasets
Li, Minne
Li, Dongsheng
Shen, Siqi
Zhang, Zhaoning
Lu, Xicheng
NETWORK AND PARALLEL COMPUTING, 2016, 9966 : 133 - 146

← 1 →