Answering Why-Questions for Subgraph Queries

被引:3
作者
Song, Qi [1 ]
Namaki, Mohammad Hossein [1 ]
Lin, Peng [1 ]
Wu, Yinghui [2 ]
机构
[1] Washington State Univ, Sch Elect Engn & Comp Sci, Pullman, WA 99163 USA
[2] Case Western Reserve Univ, Dept Comp & Data Sci, Cleveland, OH 44106 USA
关键词
Approximation algorithms; Computational modeling; Heuristic algorithms; Social networking (online); Query processing; Knowledge engineering; Knowledge based systems; Data provenance; query processing; data exploration; graph data; FRAMEWORK;
D O I
10.1109/TKDE.2020.3046436
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Subgraph queries are routinely used to search for entities in richly attributed graphs e.g., social networks and knowledge graphs. With little knowledge of underlying data, users often need to rewrite queries multiple times to reach desirable answers. Why-questions are studied to clarify missing or unexpected query results. This paper makes a first step to answer Why-questions for entity search in attributed graphs. We consider three common types of Why-questions: Why-not, Why, and Why-rank, which suggest query manipulations that are responsible for user-specified missing, unexpected, and undesirably ranked entities, respectively. (1) We approach a general query rewriting paradigm that suggests to identify desired entities that are specified by Why-questions. We introduce measures that characterize good query rewrites by incorporating both query editing cost and answer closeness. (2) While computing optimal query rewrites is intractable, we develop feasible algorithms, from approximation to fast heuristics, and provide query rewrites with (near) optimality guarantees whenever possible, for Why, Why-not and Why-rank questions. We further show that our results remain intact for Why questions that (1) request a single query rewrite to clarify multiple types of entities, and (2) variants such as Why-empty and Why-so-many, by providing the matching algorithms. Using real-world graphs, we experimentally verify that our algorithms are effective and feasible for large graphs. Our case study also verifies their application in e.g., knowledge exploration.
引用
收藏
页码:4636 / 4649
页数:14
相关论文
共 39 条
  • [1] An analytical study of large SPARQL query logs
    Bonifati, Angela
    Martens, Wim
    Timm, Thomas
    [J]. VLDB JOURNAL, 2020, 29 (2-3) : 655 - 679
  • [2] Buneman P, 2001, LECT NOTES COMPUT SC, V1973, P316
  • [3] Chapman A, 2009, ACM SIGMOD/PODS 2009 CONFERENCE, P523
  • [4] Chen L, 2016, PROC INT CONF DATA, P767, DOI 10.1109/ICDE.2016.7498288
  • [5] Cortadella Jordi, 1999, PROC 5 INT SEMINAR R
  • [6] Querying Big Graphs within Bounded Resources
    Fan, Wenfei
    Wang, Xin
    Wu, Yinghui
    [J]. SIGMOD'14: PROCEEDINGS OF THE 2014 ACM SIGMOD INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2014, : 301 - 312
  • [7] Diversified Top-k Graph Pattern Matching
    Fan, Wenfei
    Wang, Xin
    Wu, Yinghui
    [J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2013, 6 (13): : 1510 - 1521
  • [8] Gomaa Wael H, 2013, International Journal of Computer Applications, V68, P13, DOI DOI 10.5120/11638-7118
  • [9] Answering Why-Not Questions on Top-K Queries
    He, Zhian
    Lo, Eric
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2014, 26 (06) : 1300 - 1315
  • [10] A Survey of Top-k Query Processing Techniques in Relational Database Systems
    Ilyas, Ihab F.
    Beskales, George
    Soliman, Mohamed A.
    [J]. ACM COMPUTING SURVEYS, 2008, 40 (04)