A fast implementation of the ISODATA clustering algorithm

被引:141
作者
Memarsadeghi, Nargess [1 ]
Mount, David M.
Netanyahu, Nathan S.
Le Moigne, Jacqueline
机构
[1] NASA, Goddard Space Flight Ctr, Adv Architectures & Automat Branch, Greenbelt, MD 20771 USA
[2] Univ Maryland, Dept Comp Sci, College Pk, MD 20742 USA
[3] Univ Maryland, Dept Comp Sci, College Pk, MD 20742 USA
[4] Univ Maryland, Inst Adv Comp Sci, College Pk, MD 20742 USA
[5] Bar Ilan Univ, Dept Comp Sci, IL-52900 Ramat Gan, Israel
[6] Univ Maryland, Ctr Automat Res, College Pk, MD 20742 USA
关键词
clustering; ISODATA; k-means; filtering algorithm; kd-trees; approximation;
D O I
10.1142/S0218195907002252
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Clustering is central to many image processing and remote sensing applications. ISODATA is one of the most popular and widely used clustering methods in geoscience applications, but it can run slowly, particularly with large data sets. We present a more efficient approach to ISODATA clustering, which achieves better running times by storing the points in a kd-tree and through a modification of the way in which the algorithm estimates the dispersion of each cluster. We also present an approximate version of the algorithm which allows the user to further improve the running time, at the expense of lower fidelity in computing the nearest cluster center to each point. We provide both theoretical and empirical justification that our modified approach produces clusterings that are very similar to those produced by the standard ISODATA approach. We also provide empirical studies on both synthetic data and remotely sensed Landsat and MODIS images that show that our approach has significantly lower running times.
引用
收藏
页码:71 / 103
页数:33
相关论文
共 41 条
[1]   The discrete 2-center problem [J].
Agarwal, PK ;
Sharir, M ;
Welzl, E .
DISCRETE & COMPUTATIONAL GEOMETRY, 1998, 20 (03) :287-305
[2]  
[Anonymous], 1999, 40 ANN S FDN COMP SC
[3]  
ARORA S, 1998, P 30 ANN ACM S THEOR, P106, DOI DOI 10.1145/276698.276718
[4]   An optimal algorithm for approximate nearest neighbor searching in fixed dimensions [J].
Arya, S ;
Mount, DM ;
Netanyahu, NS ;
Silverman, R ;
Wu, AY .
JOURNAL OF THE ACM, 1998, 45 (06) :891-923
[5]   Local search heuristics for k-median and facility location problems [J].
Arya, V ;
Garg, N ;
Khandekar, R ;
Meyerson, A ;
Munagala, K ;
Pandit, V .
SIAM JOURNAL ON COMPUTING, 2004, 33 (03) :544-562
[6]  
BALL GH, 1964, P INT C MICR CIRC TH
[7]   MULTIDIMENSIONAL BINARY SEARCH TREES USED FOR ASSOCIATIVE SEARCHING [J].
BENTLEY, JL .
COMMUNICATIONS OF THE ACM, 1975, 18 (09) :509-517
[8]  
Bottou L., 1995, Advances in Neural Information Processing Systems 7, P585
[9]  
Feder T., 1988, Proceedings of the Twentieth Annual ACM Symposium on Theory of Computing, P434, DOI 10.1145/62212.62255
[10]  
Feller W, 1968, An Introduction to Probability Theory and Its Applications, V1