Clustering for Metric and Nonmetric Distance Measures

被引:57
作者
Ackermann, Marcel R. [1 ]
Bloemer, Johannes [1 ]
Sohler, Christian [2 ]
机构
[1] Univ Paderborn, Dept Comp Sci, D-33095 Paderborn, Germany
[2] Tech Univ, Dept Comp Sci, D-44221 Dortmund, Germany
关键词
Approximation algorithm; Bregman divergences; Itakura-Saito divergence; k-means clustering; k-median clustering; Kullback-Leibler divergence; Mahalanobis distance; random sampling;
D O I
10.1145/1824777.1824779
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
We study a generalization of the k-median problem with respect to an arbitrary dissimilarity measure D. Given a finite set P of size n, our goal is to find a set C of size k such that the sum of errors D(P, C) = Sigma(p is an element of P) min(c is an element of C) {D(p, c)} is minimized. The main result in this article can be stated as follows: There exists a (1 + epsilon)-approximation algorithm for the k-median problem with respect to D, if the I-median problem can be approximated within a factor of (1 + epsilon) by taking a random sample of constant size and solving the I-median problem on the sample exactly. This algorithm requires time n2(O(mklog(mk/epsilon))), where m is a constant that depends only on epsilon and D. Using this characterization, we obtain the first linear time (1 + epsilon)-approximation algorithms for the k-median problem in an arbitrary metric space with bounded doubling dimension, for the Kullback-Leibler divergence (relative entropy), for the Itakura-Saito divergence, for Mahalanobis distances, and for some special cases of Bregman divergences. Moreover, we obtain previously known results for the Euclidean k-median problem and the Euclidean k-means problem in a simplified manner. Our results are based on a new analysis of an algorithm of Kumar et al. [2004].
引用
收藏
页数:26
相关论文
共 39 条
[1]  
Ackermann MR, 2009, PROCEEDINGS OF THE TWENTIETH ANNUAL ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS, P1088
[2]  
ACKERMANN MR, 2001, P 19 ANN ACM SIAM S, P799
[3]  
[Anonymous], 2006, Elements of Information Theory
[4]  
[Anonymous], 1936, P NATL I SCI INDIA, DOI DOI 10.1007/S13171-019-00164-5
[5]  
[Anonymous], 2007, P 18 ANN ACM SIAM S
[6]  
[Anonymous], 2003, P 35 ANN ACM S THEOR, DOI [10.1145/780542.780550, DOI 10.1145/780542.780550]
[7]  
[Anonymous], P 31 ANN M OH STAT U
[8]  
BADOIU M, 2002, P 34 ANN ACM S THEOR, P250, DOI DOI 10.1145/509907.509947
[9]   THE ALGEBRAIC DEGREE OF GEOMETRIC OPTIMIZATION PROBLEMS [J].
BAJAJ, C .
DISCRETE & COMPUTATIONAL GEOMETRY, 1988, 3 (02) :177-191
[10]  
Baker L. D., 1998, Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, P96, DOI 10.1145/290941.290970