k-Center Clustering with Outliers in Sliding Windows

被引:3
作者
Pellizzoni, Paolo [1 ]
Pietracaprina, Andrea [1 ]
Pucci, Geppino [1 ]
机构
[1] Univ Padua, Dept Informat Engn, Via Gradenigo 6-B, I-35131 Padua, Italy
关键词
k-center with outliers; effective diameter; big data; data stream model; sliding windows; coreset; doubling dimension; approximation algorithms; ALGORITHMS;
D O I
10.3390/a15020052
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Metric k-center clustering is a fundamental unsupervised learning primitive. Although widely used, this primitive is heavily affected by noise in the data, so a more sensible variant seeks for the best solution that disregards a given number z of points of the dataset, which are called outliers. We provide efficient algorithms for this important variant in the streaming model under the sliding window setting, where, at each time step, the dataset to be clustered is the window W of the most recent data items. For general metric spaces, our algorithms achieve O1 approximation and, remarkably, require a working memory linear in k+z and only logarithmic in |W|. For spaces of bounded doubling dimension, the approximation can be made arbitrarily close to 3. For these latter spaces, we show, as a by-product, how to estimate the effective diameter of the window W, which is a measure of the spread of the window points, disregarding a given fraction of noisy distances. We also provide experimental evidence of the practical viability of the improved clustering and diameter estimation algorithms.
引用
收藏
页数:26
相关论文
共 29 条
  • [1] Altieri F., 2021, P SIAM INT C DAT MIN, P648
  • [2] Bateni M, 2021, AAAI CONF ARTIF INTE, V35, P3941
  • [3] Borassi M., 2020, P 34 NEUR INF PROC S
  • [4] Braverman V., 2016, Encyclopedia of Algorithms, P2006, DOI DOI 10.1007/978-1-4939-2864-4797
  • [5] Smooth histograms for sliding windows
    Braverman, Vladimir
    Ostrovsky, Rafail
    [J]. 48TH ANNUAL IEEE SYMPOSIUM ON FOUNDATIONS OF COMPUTER SCIENCE, PROCEEDINGS, 2007, : 283 - +
  • [6] Braverman Vladimir, 2016, SODA, P1374, DOI 10.1137/1.9781611974331.ch95
  • [7] Solving k-center Clustering (with Outliers) in MapReduce and Streaming, almost as Accurately as Sequentially
    Ceccarello, Matteo
    Pietracaprina, Andrea
    Pucci, Geppino
    [J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2019, 12 (07): : 766 - 778
  • [8] The Non-Uniform k-Center Problem
    Chakrabarty, Deeparnab
    Goyal, Prachi
    Krishnaswamy, Ravishankar
    [J]. ACM TRANSACTIONS ON ALGORITHMS, 2020, 16 (04)
  • [9] Fully Dynamic k-Center Clustering
    Chan, T-H. Hubert
    Guerquin, Arnaud
    Sozio, Mauro
    [J]. WEB CONFERENCE 2018: PROCEEDINGS OF THE WORLD WIDE WEB CONFERENCE (WWW2018), 2018, : 579 - 587
  • [10] Charikar M, 2001, SIAM PROC S, P642