Density-based data clustering algorithms for lower dimensions using space-filling curves

被引:0
作者
Xu, Bin [1 ]
Chen, Danny Z. [1 ]
机构
[1] Univ Notre Dame, Dept Comp Sci & Engn, Notre Dame, IN 46556 USA
来源
ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PROCEEDINGS | 2007年 / 4426卷
关键词
density-based clustering; secondary memory management; space-filling curves; multi-attribute hashing;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present two new density-based algorithms for clustering data points in lower dimensions (dimensions <= 10). Both our algorithms compute density-based clusters and noises in O(n) CPU time, space, and I/O cost, under some reasonable assumptions, where n is the number of input points. Besides packing the data structure into buckets and using block access techniques to reduce the I/O cost, our algorithms apply space-filling curve techniques to reduce the disk access operations. Our first algorithm (Algorithm A) focuses on handling not highly clustered input data, while the second algorithm (Algorithm B) focuses on highly clustered input data. We implemented our algorithms, evaluated the effects of various space-filling curves, identified the best space-filling curve for our approaches, and conducted extensive performance evaluation. The experiments show the high performance of our algorithms. We believe that our algorithms are of considerable practical value.
引用
收藏
页码:997 / +
页数:3
相关论文
共 10 条
[1]  
Ankerst M, 1999, SIGMOD RECORD, VOL 28, NO 2 - JUNE 1999, P49
[2]  
Arya S., 1997, 2 CGC WORKSH COMP GE
[3]   Geometric algorithms for density-based data clustering [J].
Chen, DZ ;
Smid, M ;
Xu, B .
INTERNATIONAL JOURNAL OF COMPUTATIONAL GEOMETRY & APPLICATIONS, 2005, 15 (03) :239-260
[4]   GRAY CODES FOR PARTIAL MATCH AND RANGE QUERIES [J].
FALOUTSOS, C .
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 1988, 14 (10) :1381-1393
[5]  
Hinneburg A., 1998, Proceedings Fourth International Conference on Knowledge Discovery and Data Mining, P58
[6]  
JAGADISH HV, 1990, SIGMOD REC, V19, P332, DOI 10.1145/93605.98742
[7]   Density-based clustering in spatial databases: The algorithm GDBSCAN and its applications [J].
Sander, J ;
Ester, M ;
Kriegel, HP ;
Xu, XW .
DATA MINING AND KNOWLEDGE DISCOVERY, 1998, 2 (02) :169-194
[8]  
VITTER JS, 1999, P ANN INT C AUT LANG, V1644, P119
[9]   A distribution-based clustering algorithm for mining in large spatial databases [J].
Xu, XW ;
Ester, M ;
Kriegel, HP ;
Sander, J .
14TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, PROCEEDINGS, 1998, :324-331
[10]  
ZHOU B, 1999, P 3 PAC AS C METH KN, P338