Efficient Query Processing on Graph Databases

被引：32

作者：

Cheng, James ^{[1
]}

Ke, Yiping ^{[2
]}

Ng, Wilfred ^{[3
]}

机构：

[1] Nanyang Technol Univ, Sch Comp Engn, Singapore, Singapore

[2] Chinese Univ Hong Kong, Dept Syst Engn & Management, Hong Kong, Hong Kong, Peoples R China

[3] Hong Kong Univ Sci & Technol, Clear Water Bay, Hong Kong, Peoples R China

来源：

ACM TRANSACTIONS ON DATABASE SYSTEMS | 2009年 / 34卷 / 01期

关键词：

Algorithms; Experimentation; Performance; Graph databases; graph indexing; graph query processing; frequent subgraphs;

D O I：

10.1145/1508857.1508859

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

We study the problem of processing subgraph queries on a database that consists of a set of graphs. The answer to a subgraph query is the set of graphs in the database that are supergraphs of the query. In this article, we propose an efficient index, FG*-index, to solve this problem. The cost of processing a subgraph query using most existing indexes mainly consists of two parts: the index probing cost and the candidate verification cost. Index probing is to find the query in the index, or to find the graphs from which we can generate a candidate answer set for the query. Candidate verification is to test whether each graph in the candidate set is indeed a supergraph of the query. We design FG*-index to minimize these two costs as follows. FG*-index consists of three components: the FG-index, the feature-index, and the FAQ-index. First, the FG-index employs the concept of Frequent subGraph (FG) to allow the set of queries that are FGs to be answered without candidate verification. We call this set of queries FG-queries. We can enlarge the set of FG-queries so that more queries can be answered without candidate verification; however, a larger set of FG-queries implies a larger FG-index and hence the index probing cost also increases. We propose the feature-index to reduce the index probing cost. The feature-index uses features to filter false results that are matched in the FG-index, so that we can quickly find the truly matching graphs for a query. For processing non-FG-queries, we propose the FAQ-index, which is dynamically constructed from the set of Frequently Asked non-FG-Queries (FAQs). Using the FAQ-index, verification is not required for processing FAQs and only a small number of candidates need to be verified for processing non-FG-queries that are not frequently asked. Finally, a comprehensive set of experiments verifies that query processing using FG*-index is up to orders of magnitude more efficient than state-of-the-art indexes and it is also more scalable.

引用

页数：48

共 39 条

[1]

[Anonymous], P 13 ACM SIGKDD INT

[2]

[Anonymous], 2006, KDD '06: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining

[3]

[Anonymous], P 22 INT C DAT ENG I

[4]

[Anonymous], 2004, Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

[5]

[Anonymous], 2007, P ACM SIGMOD INT C M

[6]

[Anonymous], 1994, Proceedings of the 20th International Conference on Very Large Data Bases, VLDB '94

[7]

Cheng J, 2004, LECT NOTES COMPUT SC, V2992, P219

[8] Maintaining frequent closed itemsets over a sliding window [J].

Cheng, James ;

Ke, Yiping ;

Ng, Wilfred .

JOURNAL OF INTELLIGENT INFORMATION SYSTEMS, 2008, 31 (03) :191-215

[9] A survey on algorithms for mining frequent itemsets over data streams [J].

Cheng, James ;

Ke, Yiping ;

Ng, Wilfred .

KNOWLEDGE AND INFORMATION SYSTEMS, 2008, 16 (01) :1-27

[10] Effective elimination of redundant association rules [J].

Cheng, James ;

Ke, Yiping ;

Ng, Wilfred .

DATA MINING AND KNOWLEDGE DISCOVERY, 2008, 16 (02) :221-249

← 1 2 3 4 →