Explainable Similarity of Datasets Using Knowledge Graph

被引：3

作者：

Skoda, Petr ^{[1
]}

Klimek, Jakub ^{[1
]}

Necasky, Martin ^{[1
]}

Skopal, Tomas ^{[1
]}

机构：

[1] Charles Univ Prague, Dept Software Engn, Fac Math & Phys, Malostranske Namesti 25, Prague 11800 1, Czech Republic

来源：

SIMILARITY SEARCH AND APPLICATIONS (SISAP 2019) | 2019年 / 11807卷

关键词：

Similarity; Datasets; Search; Graph;

D O I：

10.1007/978-3-030-32047-8_10

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

There is a large quantity of datasets available as Open Data on the Web. However, it is challenging for users to find datasets relevant to their needs, even though the datasets are registered in catalogs such as the European Data Portal. This is because the available metadata such as keywords or textual description is not descriptive enough. At the same time, datasets exist in various types of contexts not expressed in the metadata. These may include information about the dataset publisher, the legislation related to dataset publication, language and cultural specifics, etc. In this paper we introduce a similarity model for matching datasets. The model assumes an ontology/knowledge graph, such as Wikidata.org, that serves as a graph-based context to which individual datasets are mapped based on their metadata. A similarity of the datasets is then computed as an aggregation over paths among nodes in the graph. The proposed similarity aims at addressing the problem of explainability of similarity, i.e., providing the user a structured explanation of the match which, in a broader sense, is nowadays a hot topic in the field of artificial intelligence.

引用

页码：103 / 110

页数：8