RDFFrames: knowledge graph access for machine learning tools

被引:0
作者
Aisha Mohamed
Ghadeer Abuoda
Abdurrahman Ghanem
Zoi Kaoudi
Ashraf Aboulnaga
机构
[1] Qatar Computing Research Institute,
[2] HBKU,undefined
[3] College of Science and Engineering,undefined
[4] HBKU,undefined
[5] Bluescape,undefined
[6] Technische Universität Berlin,undefined
来源
The VLDB Journal | 2022年 / 31卷
关键词
Knowledge graphs; RDF; SPARQL; PyData; Data preparation; Machine learning;
D O I
暂无
中图分类号
学科分类号
摘要
Knowledge graphs represented as RDF datasets are integral to many machine learning applications. RDF is supported by a rich ecosystem of data management systems and tools, most notably RDF database systems that provide a SPARQL query interface. Surprisingly, machine learning tools for knowledge graphs do not use SPARQL, despite the obvious advantages of using a database system. This is due to the mismatch between SPARQL and machine learning tools in terms of data model and programming style. Machine learning tools work on data in tabular format and process it using an imperative programming style, while SPARQL is declarative and has as its basic operation matching graph patterns to RDF triples. We posit that a good interface to knowledge graphs from a machine learning software stack should use an imperative, navigational programming paradigm based on graph traversal rather than the SPARQL query paradigm based on graph patterns. In this paper, we present RDFFrames, a framework that provides such an interface. RDFFrames provides an imperative Python API that gets internally translated to SPARQL, and it is integrated with the PyData machine learning software stack. RDFFrames enables the user to make a sequence of Python calls to define the data to be extracted from a knowledge graph stored in an RDF database system, and it translates these calls into a compact SPQARL query, executes it on the database system, and returns the results in a standard tabular format. Thus, RDFFrames is a useful tool for data preparation that combines the usability of PyData with the flexibility and performance of RDF database systems.
引用
收藏
页码:321 / 346
页数:25
相关论文
共 33 条
[1]  
Abadi D(2019)The Seattle report on database research SIGMOD Rec. 48 44-53
[2]  
Belleau F(2008)Bio2RDF: towards a mashup to build bioinformatics knowledge systems J. Biomed. Inf. 41 706-716
[3]  
Nolin MA(1993)What is a knowledge representation? AI Mag. 14 17-17
[4]  
Tourigny N(2018)AIDA - Abstraction for advanced in-database analytics PVLDB 11 1400-1413
[5]  
Rigault P(2015)DBpedia - a large-scale, multilingual knowledge base extracted from Wikipedia Semant. Web 6 167-195
[6]  
Morissette J(2000)Automating the construction of Internet portals with machine learning Inf. Retr. 3 127-163
[7]  
Davis R(2012)Babelnet: the automatic construction, evaluation and application of a wide-coverage multilingual semantic network Artif. Intell. 193 217-250
[8]  
Shrobe H(2015)A review of relational machine learning for knowledge graphs Proc. IEEE 104 11-33
[9]  
Szolovits P(2010)nSPARQL: a navigational language for RDF J Web Semant. 8 255-270
[10]  
Dsilva JV(2017)Knowledge graph embedding: a survey of approaches and applications TKDE 29 2724-2743