On the Parallel Complexity of Hierarchical Clustering and CC-Complete Problems

被引:9
作者
Greenlaw, Raymond [1 ]
Kantabutra, Sanpawat [2 ]
机构
[1] Armstrong Atlantic State Univ, Dept Comp Sci, Savannah, GA 31419 USA
[2] Chiang Mai Univ, Theory Computat Grp, Dept Comp Sci, Chiang Mai 50200, Thailand
关键词
CC-completeness; data mining; hierarchical clustering; and parallel depth;
D O I
10.1002/cplx.20238
中图分类号
O1 [数学];
学科分类号
0701 ; 070101 ;
摘要
Complex data sets are often unmanageable unless they can be subdivided and simplified in an intelligent manner. Clustering is a technique that is used in data mining and scientific analysis for partitioning data set into groups of similar or nearby items. Hierarchical clustering is an important and well-studied clustering method involving both top-down and bottom-up subdivisions of data. In this article we address the parallel complexity of hierarchical clustering. We describe known sequential algorithms for top-down and bottom-up hierarchical clustering. The top-down algorithm can be parallelized, and when there are n points to be clustered, we provide an O(log n)-time, n(2)-processor CREW PRAM algorithm that computes the same output as its corresponding sequential algorithm. We define a natural decision problem based on bottom-up hierarchical clustering, and add this HIERARCHICAL CLUSTERING PROBLEM (HCP) to the slowly growing list of CC-complete problems, thereby showing that HCP is one of the computationally most difficult problems in the COMPARATOR CIRCUIT VALUE PROBLEM class. This class contains a variety of interesting problems, and now for the first time a problem from data mining as well. By proving that HCP is CC-complete, we have demonstrated that HCP is very unlikely to have an NC algorithm. This result is in sharp contrast to the NC algorithm which we give for the top-down sequential approach, and the result surprisingly shows that the parallel complexities of the top-down and bottom-up approaches are different, unless CC equals NC. In addition, we provide a compendium of all known CC-complete problems. (C) 2008 Wiley Periodicals, Inc. Complexity 14: 18-28, 2008
引用
收藏
页码:18 / 28
页数:11
相关论文
共 34 条
[1]  
Agnarsson G., 2007, Graph Theory: Modeling, Applications, and Algorithms, Pearson international edition
[2]  
[Anonymous], 1983, NOTES INTRO COMBINAT
[3]  
BLUMENTHAL L., 1953, THEORY APPL DISTANCE
[4]   Principal direction divisive partitioning [J].
Boley, D .
DATA MINING AND KNOWLEDGE DISCOVERY, 1998, 2 (04) :325-344
[5]   Concurrent threads and optimal parallel minimum spanning trees algorithm [J].
Chong, KW ;
Han, YJ ;
Lam, TW .
JOURNAL OF THE ACM, 2001, 48 (02) :297-323
[6]   PARALLEL MERGE SORT [J].
COLE, R .
SIAM JOURNAL ON COMPUTING, 1988, 17 (04) :770-785
[7]   A TAXONOMY OF PROBLEMS WITH FAST PARALLEL ALGORITHMS [J].
COOK, SA .
INFORMATION AND CONTROL, 1985, 64 (1-3) :2-22
[8]  
Dash M, 2004, LECT NOTES COMPUT SC, V3149, P363
[9]   A NEW FIXED-POINT APPROACH FOR STABLE NETWORKS AND STABLE MARRIAGES [J].
FEDER, T .
JOURNAL OF COMPUTER AND SYSTEM SCIENCES, 1992, 45 (02) :233-284
[10]  
Gibbons A., 1985, ALGORITHMIC GRAPH TH