Top-k differential queries in graph databases

被引:4
作者
Vasilyeva, Elena [1 ]
Thiele, Maik [2 ]
Bornhövd, Christof [3 ]
Lehner, Wolfgang [2 ]
机构
[1] SAP AG, Chemnitzer Str. 48, Dresden
[2] Technische Universität Dresden, Database Technology Group, Nöthnitzer Str. 46, Dresden
[3] SAP Labs, LLC, Palo Alto
来源
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) | 2014年 / 8716卷
关键词
Flooding; Graph databases; Top-k Differential Queries;
D O I
10.1007/978-3-319-10933-6_9
中图分类号
学科分类号
摘要
The sheer volume as well as the schema complexity of today’s graph databases impede the users in formulating queries against these databases and often cause queries to “fail”by delivering empty answers. To support users in such situations, the concept of differential queries can be used to bridge the gap between an unexpected result (e.g. an empty result set) and the query intention of users. These queries deliver missing parts of a query graph and, therefore, work with such scenarios that require users to specify a query graph. Based on the discovered information about a missing query subgraph, users may understand which vertices and edges are the reasons for queries that unexpectedly return empty answers, and thus can reformulate the queries if needed. A study showed that the result sets of differential queries are often too large to be manually introspected by users and thus a reduction of the number of results and their ranking is required. To address these issues, we extend the concept of differential queries and introduce top-k differential queries that calculate the ranking based on users’ preferences and therefore significantly support the users’ understanding of query database management systems. The idea consists of assigning relevance weights to vertices or edges of a query graph by users that steer the graph search and are used in the scoring function for top-k differential results. Along with the novel concept of the top-k differential queries, we further propose a strategy for propagating relevance weights and we model the search along the most relevant paths. © Springer International Publishing Switzerland 2014.
引用
收藏
页码:112 / 115
页数:3
相关论文
共 17 条
  • [1] Amer-Yahia S., Koudas N., Marian A., Srivastava D., Toman D., Structure and Content Scoring for XML, Proc. Of VLDB Endow, pp. 361-372, (2005)
  • [2] Bornhovd C., Kubis R., Lehner W., Voigt H., Werner H., Flexible Information Management, Exploration and Analysis in SAP HANA, DATA, pp. 15-28, (2012)
  • [3] Chapman A., Jagadish H.V., Why not?, Proc. Of ACM SIGMOD, pp. 523-534, (2009)
  • [4] Fan W., Wang X., Wu Y., Diversified top-k graph pattern matching, Proc. Of VLDB Endow, 6, pp. 1510-1521, (2013)
  • [5] Gupta M., Gao J., Yan X., Cam H., Han J., Top-k interesting subgraph discovery in information networks, Proc. Of ICDE, pp. 820-831, (2014)
  • [6] Huang J., Chen T., Doan A., Naughton J.F., On the provenance of non-answers to queries over extracted data, Proc. Of VLDB Endow, 1, pp. 736-747, (2008)
  • [7] Kasneci G., Suchanek F., Ifrim G., Ramanath M., Weikum G., Naga: Searching and ranking knowledge, Proc. Of ICDE, pp. 953-962, (2008)
  • [8] McGregor J.J., Backtrack search algorithms and the maximal common subgraph problem, Software: Practice and Experience, 12, 1, pp. 23-34, (1982)
  • [9] Melnik S., Garcia-Molina H., Rahm E., Similarity flooding: A versatile graph matching algorithm and its application to schema matching, Proc. Of ICDE, pp. 117-128, (2002)
  • [10] Mottin D., Marascu A., Roy S.B., Das G., Palpanas T., Velegrakis Y., A probabilistic optimization framework for the empty-answer problem, Proc. Of VLDB Endow, 6, pp. 1762-1773, (2013)