ABSTAT-HD: a scalable tool for profiling very large knowledge graphs

被引:5
作者
Alva Principe, Renzo Arturo [1 ]
Maurino, Andrea [1 ]
Palmonari, Matteo [1 ]
Ciavotta, Michele [1 ]
Spahiu, Blerina [1 ]
机构
[1] Univ Milano Bicocca, Dept Informat Syst & Commun, Viale Sarca 336, Milan, Italy
关键词
Knowledge graph; Data management; Data quality; Data profiling; Distributed processing engine; RDF;
D O I
10.1007/s00778-021-00704-2
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Processing large-scale and highly interconnected Knowledge Graphs (KG) is becoming crucial for many applications such as recommender systems, question answering, etc. Profiling approaches have been proposed to summarize large KGs with the aim to produce concise and meaningful representation so that they can be easily managed. However, constructing profiles and calculating several statistics such as cardinality descriptors or inferences are resource expensive. In this paper, we present ABSTAT-HD, a highly distributed profiling tool that supports users in profiling and understanding big and complex knowledge graphs. We demonstrate the impact of the new architecture of ABSTAT-HD by presenting a set of experiments that show its scalability with respect to three dimensions of the data to be processed: size, complexity and workload. The experimentation shows that our profiling framework provides informative and concise profiles, and can process and manage very large KGs.
引用
收藏
页码:851 / 876
页数:26
相关论文
共 54 条
  • [1] Abedjan Z, 2014, PROC INT CONF DATA, P1198, DOI 10.1109/ICDE.2014.6816740
  • [2] Ali W., 2020, ARXIV200910331
  • [3] Alzogbi A., 2013, LDOW, V996
  • [4] Spark SQL: Relational Data Processing in Spark
    Armbrust, Michael
    Xin, Reynold S.
    Lian, Cheng
    Huai, Yin
    Liu, Davies
    Bradley, Joseph K.
    Meng, Xiangrui
    Kaftan, Tomer
    Franklint, Michael J.
    Ghodsi, Ali
    Zaharia, Matei
    [J]. SIGMOD'15: PROCEEDINGS OF THE 2015 ACM SIGMOD INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2015, : 1383 - 1394
  • [5] Auer S., 2012, LNCS, V7603, P353, DOI DOI 10.1007/978-3-642-33876-2_31
  • [6] A Cost Model for SPARK SQL
    Baldacci, Lorenzo
    Golfarelli, Matteo
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2019, 31 (05) : 819 - 832
  • [7] Profiling Linked Open Data with ProLOD
    Boehm, Christoph
    Naumann, Felix
    Abedjan, Ziawasch
    Fenz, Dandy
    Gruetze, Toni
    Hefenbrock, Daniel
    Pohl, Matthias
    Sonnabend, David
    [J]. 2010 IEEE 26TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING WORKSHOPS (ICDE 2010), 2010, : 175 - 178
  • [8] Introducing RDF Graph Summary with application to Assisted SPARQL Formulation
    Campinas, Stephane
    Perry, Thomas E.
    Ceccarelli, Diego
    Delbru, Renaud
    Tummarello, Giovanni
    [J]. 2012 23RD INTERNATIONAL WORKSHOP ON DATABASE AND EXPERT SYSTEMS APPLICATIONS (DEXA), 2012, : 261 - 266
  • [9] Summarizing semantic graphs: a survey
    Cebiric, Sejla
    Goasdoue, Francois
    Kondylakis, Haridimos
    Kotzinos, Dimitris
    Manolescu, Ioana
    Troullinou, Georgia
    Zneika, Mussab
    [J]. VLDB JOURNAL, 2019, 28 (03) : 295 - 327
  • [10] Cebiric S, 2015, PROC VLDB ENDOW, V8, P2013