Incremental clustering and dynamic information retrieval

被引：147

作者：

Charikar, M ^{[1
]}

Chekuri, C ^{[1
]}

Feder, T ^{[1
]}

Motwani, R ^{[1
]}

机构：

[1] Stanford Univ, Dept Comp Sci, Stanford, CA 94305 USA

来源：

SIAM JOURNAL ON COMPUTING | 2004年 / 33卷 / 06期

关键词：

incremental clustering; dynamic information retrieval; minimum diameter clustering; agglomerative clustering; k-center; performance guarantee;

D O I：

10.1137/S0097539702418498

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

Motivated by applications such as document and image classification in information retrieval, we consider the problem of clustering dynamic point sets in a metric space. We propose a model called incremental clustering which is based on a careful analysis of the requirements of the information retrieval application, and which should also be useful in other applications. The goal is to efficiently maintain clusters of small diameter as new points are inserted. We analyze several natural greedy algorithms and demonstrate that they perform poorly. We propose new deterministic and randomized incremental clustering algorithms which have a provably good performance, and which we believe should also perform well in practice. We complement our positive results with lower bounds on the performance of incremental algorithms. Finally, we consider the dual clustering problem where the clusters are of fixed diameter, and the goal is to minimize the number of clusters.

引用

页码：1417 / 1440

页数：24

共 50 条

[1]

Aldenderfer M., 1984, Cluster Analysis, DOI DOI 10.4135/9781412983648

[2]

[Anonymous], 2003, P 35 ANN ACM S THEOR, DOI DOI 10.1145/780542.780548

[3]

[Anonymous], 1996, APPROXIMATION ALGORI

[4]

[Anonymous], 1979, Computers and Intractablity: A Guide to the Theoryof NP-Completeness

[5]

[Anonymous], 1994, MANAGING GIGABYTES C

[6]

Arya V., 2001, 33 ACM S THEORY COMP, P21

[7]

BARTAL Y, 2001, P 33 ANN ACM S THEOR, P11

[8]

BERN M, 1996, APPROXIMATION ALGORI, P296

[9]

Bonnesen T., 1974, THEORIE KONVEXEN KOR

[10]

Brucker P, 1977, OPTIMIZATION OPERATI, P45

← 1 2 3 4 5 →