An efficient parallel clustering algorithm for large scale database

被引:0
|
作者
School of Electronic Information, Wuhan University, Wuhan, Hubei, China [1 ]
不详 [2 ]
不详 [3 ]
机构
[1] School of Electronic Information, Wuhan University, Wuhan, Hubei
[2] Hubei Bureau of Surveying and Mapping, Wuhan, Hubei
[3] PRC Education, Intel China Ltd., Shanghai
来源
J. Softw. | 2009年 / 10卷 / 1119-1126期
关键词
Clustering; Parallel pattern; Parallel processing; Performance analysis; SLPP; SLPPCA;
D O I
10.4304/jsw.4.10.1119-1126
中图分类号
学科分类号
摘要
In this paper, we propose a new parallel clustering algorithm, named Stem-Leaf-Point Plot Clustering Algorithm (SLPPCA). SLPPCA tends to produce clusters of different shapes and sizes, and according to our experiments, it can produces clusters more efficiently than traditional methods. SLPPCA can fully exploits the data-parallelism of data objects, and adopts a task decomposition design step to balance the workloads of multi-core processors to achieve a high speedup. We implemented SLPPCA to large scale data base on duo-core processor and quad-core processor based computer separately and analyzed its performance. The experimental results show that the clusters it produced were particularly good either in different density or shapes, furthermore, with the parallel pattern used in SLPPCA on multi-core platform, the speedup was almost linear with the numbers of cores in processor and the number of data points. Moreover, SLPPCA can generate satisfactory cluster number automatically in clustering process. © 2009 Academy Publisher.
引用
收藏
页码:1119 / 1126
页数:7
相关论文
共 50 条
  • [31] Genetic Algorithm Based Clustering for Large-Scale Sensor Networks
    Lin, Hai
    Kong, Ruoshan
    Liu, Jiali
    CYBERNETICS AND INFORMATION TECHNOLOGIES, 2015, 15 (06) : 168 - 177
  • [32] An Efficient Influence based Label Propagation Algorithm for Clustering Large Graphs
    Bhatia, Vandana
    Rani, Rinkle
    2017 INTERNATIONAL CONFERENCE ON INFOCOM TECHNOLOGIES AND UNMANNED SYSTEMS (TRENDS AND FUTURE DIRECTIONS) (ICTUS), 2017, : 420 - 426
  • [33] AGRID: An efficient algorithm for clustering large high-dimensional datasets
    Zhao, YC
    Song, JD
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, 2003, 2637 : 271 - 282
  • [34] EEMC: An Energy-Efficient Multi-tier Clustering algorithm for large-scale wireless sensor networks
    Jin, Yan
    Wang, Ling
    Kim, Yoohwan
    Yang, Xiaozong
    2006 IEEE INTERNATIONAL CONFERENCE ON WIRELESS COMMUNICATIONS, NETWORKING AND MOBILE COMPUTING, VOLS 1-4, 2006, : 1084 - +
  • [35] Efficient exploratory clustering analyses in large-scale exploration processes
    Manuel Fritz
    Michael Behringer
    Dennis Tschechlov
    Holger Schwarz
    The VLDB Journal, 2022, 31 : 711 - 732
  • [36] Efficient exploratory clustering analyses in large-scale exploration processes
    Fritz, Manuel
    Behringer, Michael
    Tschechlov, Dennis
    Schwarz, Holger
    VLDB JOURNAL, 2022, 31 (04) : 711 - 732
  • [37] An efficient clustering-based task scheduling algorithm for parallel programs with task duplication
    Lin, Wei-Ming
    Gu, Qiuyan
    JOURNAL OF INFORMATION SCIENCE AND ENGINEERING, 2007, 23 (02) : 589 - 604
  • [38] A Parallel Clustering Algorithm with MPI - MKmeans
    Zhang, Jing
    Wu, Gongqing
    Hu, Xuegang
    Li, Shiying
    Hao, Shuilong
    JOURNAL OF COMPUTERS, 2013, 8 (01) : 10 - 17
  • [39] A low overhead parallel clustering algorithm
    Gharib, TF
    El-Ghazawi, T
    PDPTA'2001: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED PROCESSING TECHNIQUES AND APPLICATIONS, 2001, : 864 - 867
  • [40] A Parallel Elastic Net Clustering Algorithm
    Feng, Tzu-Yi
    Tsai, Chun-Wei
    Chiang, Ming-Chao
    Yang, Chu-Sing
    2018 IEEE INTERNATIONAL CONFERENCE ON SMART INTERNET OF THINGS (SMARTIOT 2018), 2018, : 40 - 45