STHist-C: a highly accurate cluster-based histogram for two and three dimensional geographic data points

被引:2
作者
Hai Thanh Mai [1 ]
Kim, Jaeho [1 ]
Roh, Yohan J. [2 ]
Kim, Myoung Ho [1 ]
机构
[1] Korea Adv Inst Sci & Technol, Dept Comp Sci, Taejon 305701, South Korea
[2] Samsung Elect, Samsung Adv Inst Technol, Yongin 446712, Gyeonggi Do, South Korea
基金
新加坡国家研究基金会;
关键词
Spatial databases; Geographic Information Systems; Query optimization; Histograms; Selectivity estimation;
D O I
10.1007/s10707-012-0154-y
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Histograms have been widely used for estimating selectivity in query optimization. In this paper, we propose a new histogram construction method for geographic data objects that are used in many real-world applications. The proposed method is based on analyses and utilization of clusters of objects that exist in a given data set, to build histograms with significantly enhanced accuracy. Our philosophy in allocating the histogram buckets is to allocate them to the subspaces that properly capture object clusters. Therefore, we first propose a procedure to find the centers of object clusters. Then, we propose an algorithm to construct the histogram buckets from these centers. The buckets are initialized from the clusters' centers, then expanded to cover the clusters. Best expansion plans are chosen based on a notion of skewness gain. Results from extensive experiments using real-life data sets demonstrate that the proposed method can really improve the accuracy of the histograms further, when compared with the current state of the art histogram construction method for geographic data objects.
引用
收藏
页码:325 / 352
页数:28
相关论文
共 32 条
[1]  
Aboulnaga A, 1999, SIGMOD RECORD, VOL 28, NO 2 - JUNE 1999, P181, DOI 10.1145/304181.304198
[2]  
Acharya S, 1999, SIGMOD RECORD, VOL 28, NO 2 - JUNE 1999, P13, DOI 10.1145/304181.304184
[3]  
[Anonymous], 1992, SIGMOD
[4]  
[Anonymous], 1980, THESIS CASE W RESERV
[5]  
[Anonymous], 2004, VLDB
[6]  
[Anonymous], 2006, PROC 22TH ANN IEEE I
[7]  
Arthur D, 2007, PROCEEDINGS OF THE EIGHTEENTH ANNUAL ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS, P1027
[8]  
Blohsfeld B, 1999, SIGMOD RECORD, VOL 28, NO 2 - JUNE 1999, P239, DOI 10.1145/304181.304203
[9]  
Bruno N, 2001, SIGMOD RECORD, V30, P211, DOI 10.1145/376284.375686
[10]   Intelligent Choice of the Number of Clusters in K-Means Clustering: An Experimental Study with Different Cluster Spreads [J].
Chiang, Mark Ming-Tso ;
Mirkin, Boris .
JOURNAL OF CLASSIFICATION, 2010, 27 (01) :3-40