Distributed Dual Averaging Based Data Clustering

被引:2
作者
Servetnyk, Mykola [1 ]
Fung, Carrson C. C. [1 ]
机构
[1] Natl Yang Ming Chiao Tung Univ, Inst Elect, Taipei 300, Taiwan
关键词
Clustering algorithms; Distributed databases; Convergence; Big Data; Symmetric matrices; Approximation algorithms; Task analysis; distributed algorithms; unbalanced data; security conscious algorithm; subgradient methods; ALGORITHM;
D O I
10.1109/TBDATA.2022.3146169
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Multiagent distributed clustering scheme is proposed herein to process data which are collected by dispersed sensors that are not under centralized control. Two methods based on distributed dual averaging (DDA) algorithm are proposed, which are able to incorporate network structure and do not require exchange of centroid estimates, which makes it appealing for security conscious applications. The first method provides the framework for distributed clustering using the DDA algorithm with predefined regularization parameter. The second method, called Adaptive DDA (ADDA), relaxes the condition concerning a priori knowledge about the centroids, assumed in the first method, without losing clustering performance. This is achieved by properly regularizing the problem where a data-driven approach is used to determine the regularization parameter. The proposed methods are further extended via the proposed Bin method to scenario where processing agents store unbalanced amount of data with non-IID class distribution. Experiments are conducted on both real-life and synthetic data. Numerical results show the efficacy of the proposed approaches compared to state-of-art centralized algorithm and other distributed approaches.
引用
收藏
页码:372 / 379
页数:8
相关论文
共 26 条
  • [1] Distributed data clustering over networks
    Altilio, Rosa
    Di Lorenzo, Paolo
    Panella, Massimo
    [J]. PATTERN RECOGNITION, 2019, 93 : 603 - 620
  • [2] Arthur D, 2007, PROCEEDINGS OF THE EIGHTEENTH ANNUAL ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS, P1027
  • [3] Gravitational Clustering: A simple, robust and adaptive approach for distributed networks
    Binder, Patricia
    Muma, Michael
    Zoubir, Abdelhak M.
    [J]. SIGNAL PROCESSING, 2018, 149 : 36 - 48
  • [4] Boyd S., 2004, CONVEX OPTIMIZATION, DOI [DOI 10.1017/CBO9780511804441, 10.1017/CBO9780511804441]
  • [5] A data-clustering algorithm on distributed memory multiprocessors
    Dhillon, IS
    Modha, DS
    [J]. LARGE-SCALE PARALLEL DATA MINING, 2000, 1759 : 245 - 260
  • [6] Dua Dheeru, 2019, UCI machine learning repository
  • [7] Dual Averaging for Distributed Optimization: Convergence Analysis and Network Scaling
    Duchi, John C.
    Agarwal, Alekh
    Wainwright, Martin J.
    [J]. IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2012, 57 (03) : 592 - 606
  • [8] Ester M., 1996, P 2 INT C KNOWL DISC, P226, DOI DOI 10.5555/3001460.3001507
  • [9] A Time Series Clustering Technique based on Community Detection in Networks
    Ferreira, Leonardo N.
    Zhao, Liang
    [J]. INNS CONFERENCE ON BIG DATA 2015 PROGRAM, 2015, 53 : 183 - 190
  • [10] Forero P. A., 2008, P WORKSH SENS SIGN I, P11