Conquering the divide: Continuous clustering of distributed data streams

被引:0
|
作者
Cormode, Graham
Muthukrishnan, S.
Zhuang, Wei
机构
来源
2007 IEEE 23RD INTERNATIONAL CONFERENCE ON DATA ENGINEERING, VOLS 1-3 | 2007年
关键词
LOCATION-PROBLEMS;
D O I
暂无
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Data is often collected over a distributed network, but in many cases, is so voluminous that it is impractical and undesirable to collect it in a central location. Instead, we must perform distributed computations over the data, guaranteeing high quality answers even as new data arrives. In this paper we formalize and study the problem of maintaining a clustering of such distributed data that is continuously evolving. In particular our goal is to minimize the communication and computational cost, still providing guaranteed accuracy of the clustering. We focus on the k-center clustering, and provide a suite of algorithms that vary based on which centralized algorithm they derive from, and whether they maintain a single global clustering or many local clusterings that can be merged together We show that these algorithms can be designed to give accuracy guarantees that are close to the best possible even in the centralized case. In our experiments, we see clear trends among these algorithms, showing that the choice of algorithm is crucial, and that we can achieve a clustering that is as good as the best centralized clustering, with only a small fraction of the communication required to collect all the data in a single location.
引用
收藏
页码:1011 / 1020
页数:10
相关论文
共 50 条
  • [1] Distributed clustering of ubiquitous data streams
    Rodrigues, Pedro Pereira
    Gama, Joao
    WILEY INTERDISCIPLINARY REVIEWS-DATA MINING AND KNOWLEDGE DISCOVERY, 2014, 4 (01) : 38 - 54
  • [2] Clustering Distributed Sensor Data Streams
    Rodrigues, Pedro Pereira
    Gama, Joao
    Lopes, Luis
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, PART II, PROCEEDINGS, 2008, 5212 : 282 - +
  • [3] Approximate clustering on distributed data streams
    Zhang, Qi
    Liu, Jinze
    Wang, Wei
    2008 IEEE 24TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, VOLS 1-3, 2008, : 1131 - +
  • [4] A Clustering Approach for Anonymizing Distributed Data Streams
    Mohamed, Mona A.
    Nagi, Magdy H.
    Ghanem, Sahar M.
    PROCEEDINGS OF 2016 11TH INTERNATIONAL CONFERENCE ON COMPUTER ENGINEERING & SYSTEMS (ICCES), 2016, : 9 - 16
  • [5] Monitoring Distributed Data Streams through Node Clustering
    Barouti, Maria
    Keren, Daniel
    Kogan, Jacob
    Malinovsky, Yaakov
    MACHINE LEARNING AND DATA MINING IN PATTERN RECOGNITION, MLDM 2014, 2014, 8556 : 149 - 162
  • [6] Continuous trend-based clustering in data streams
    Kontaki, Maria
    Papadopoulos, Apostolos N.
    Manolopoulos, Yannis
    DATA WAREHOUSING AND KNOWLEDGE DISCOVERY, PROCEEDINGS, 2008, 5182 : 251 - 262
  • [7] Continuous Skyline Monitoring over Distributed Data Streams
    Lu, Hua
    Zhou, Yongluan
    Haustad, Jonas
    SCIENTIFIC AND STATISTICAL DATABASE MANAGEMENT, 2010, 6187 : 565 - +
  • [8] Continuous adaptive outlier detection on distributed data streams
    Su, Liang
    Han, Weihong
    Yang, Shuqiang
    Zou, Peng
    Jia, Yan
    HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS, PROCEEDINGS, 2007, 4782 : 74 - 85
  • [9] Gossip-based Spectral Clustering of Distributed Data Streams
    Talistu, Matt
    Moh, Teng-Sheng
    Moh, Melody
    PROCEEDINGS OF THE 2015 INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING & SIMULATION (HPCS 2015), 2015, : 325 - 333
  • [10] Distributed weighted clustering of evolving sensor data streams with noise
    Hassani, Marwan
    Seidl, Thomas
    Journal of Digital Information Management, 2012, 10 (06): : 410 - 420