Efficient Query Processing on Graph Databases

被引:32
作者
Cheng, James [1 ]
Ke, Yiping [2 ]
Ng, Wilfred [3 ]
机构
[1] Nanyang Technol Univ, Sch Comp Engn, Singapore, Singapore
[2] Chinese Univ Hong Kong, Dept Syst Engn & Management, Hong Kong, Hong Kong, Peoples R China
[3] Hong Kong Univ Sci & Technol, Clear Water Bay, Hong Kong, Peoples R China
来源
ACM TRANSACTIONS ON DATABASE SYSTEMS | 2009年 / 34卷 / 01期
关键词
Algorithms; Experimentation; Performance; Graph databases; graph indexing; graph query processing; frequent subgraphs;
D O I
10.1145/1508857.1508859
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We study the problem of processing subgraph queries on a database that consists of a set of graphs. The answer to a subgraph query is the set of graphs in the database that are supergraphs of the query. In this article, we propose an efficient index, FG*-index, to solve this problem. The cost of processing a subgraph query using most existing indexes mainly consists of two parts: the index probing cost and the candidate verification cost. Index probing is to find the query in the index, or to find the graphs from which we can generate a candidate answer set for the query. Candidate verification is to test whether each graph in the candidate set is indeed a supergraph of the query. We design FG*-index to minimize these two costs as follows. FG*-index consists of three components: the FG-index, the feature-index, and the FAQ-index. First, the FG-index employs the concept of Frequent subGraph (FG) to allow the set of queries that are FGs to be answered without candidate verification. We call this set of queries FG-queries. We can enlarge the set of FG-queries so that more queries can be answered without candidate verification; however, a larger set of FG-queries implies a larger FG-index and hence the index probing cost also increases. We propose the feature-index to reduce the index probing cost. The feature-index uses features to filter false results that are matched in the FG-index, so that we can quickly find the truly matching graphs for a query. For processing non-FG-queries, we propose the FAQ-index, which is dynamically constructed from the set of Frequently Asked non-FG-Queries (FAQs). Using the FAQ-index, verification is not required for processing FAQs and only a small number of candidates need to be verified for processing non-FG-queries that are not frequently asked. Finally, a comprehensive set of experiments verifies that query processing using FG*-index is up to orders of magnitude more efficient than state-of-the-art indexes and it is also more scalable.
引用
收藏
页数:48
相关论文
共 39 条
[1]  
[Anonymous], P 13 ACM SIGKDD INT
[2]  
[Anonymous], 2006, KDD '06: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
[3]  
[Anonymous], P 22 INT C DAT ENG I
[4]  
[Anonymous], 2004, Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
[5]  
[Anonymous], 2007, P ACM SIGMOD INT C M
[6]  
[Anonymous], 1994, Proceedings of the 20th International Conference on Very Large Data Bases, VLDB '94
[7]  
Cheng J, 2004, LECT NOTES COMPUT SC, V2992, P219
[8]   Maintaining frequent closed itemsets over a sliding window [J].
Cheng, James ;
Ke, Yiping ;
Ng, Wilfred .
JOURNAL OF INTELLIGENT INFORMATION SYSTEMS, 2008, 31 (03) :191-215
[9]   A survey on algorithms for mining frequent itemsets over data streams [J].
Cheng, James ;
Ke, Yiping ;
Ng, Wilfred .
KNOWLEDGE AND INFORMATION SYSTEMS, 2008, 16 (01) :1-27
[10]   Effective elimination of redundant association rules [J].
Cheng, James ;
Ke, Yiping ;
Ng, Wilfred .
DATA MINING AND KNOWLEDGE DISCOVERY, 2008, 16 (02) :221-249