Parallel Two-Phase K-Means

被引:0
作者
Cuong Duc Nguyen [1 ]
Dung Tien Nguyen [1 ]
Van-Hau Pham [1 ]
机构
[1] Int Univ VNU HCM, Hanoi, Vietnam
来源
COMPUTATIONAL SCIENCE AND ITS APPLICATIONS - ICCSA 2013, PT V | 2013年 / 7975卷
关键词
Data Clustering; K-means; Parallel Distributed Computing; MapReduce; MEANS ALGORITHM;
D O I
暂无
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, a new parallel version of Two-Phase K-means, called Parallel Two-Phase K-means (Par2PK-means), is introduced to overcome limits of available parallel versions. Par2PK-means is developed and executed on the MapReduce framework. It is divided into two phases. In the first phase, Mappers independently work on data segments to create an intermediate data. In the second phase, the intermediate data collected from Mappers are clustered by the Reducer to create the final clustering result. Testing on large data sets, the newly proposed algorithm attained a good speedup ratio, closing to the linearly speed-up ratio, when comparing to the sequential version Two-Phase K-means.
引用
收藏
页码:224 / 231
页数:8
相关论文
共 13 条
  • [1] [Anonymous], 2006, NIPS
  • [2] Dean J, 2004, USENIX ASSOCIATION PROCEEDINGS OF THE SIXTH SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION (OSDE '04), P137
  • [3] Frank A., 2010, UCI machine learning repository, V213
  • [4] Jinlan T., 2005, TSINGHUA SCI TECHNOL, V10, P277
  • [5] Kantabutra S., 2000, NECTEC Technical journal, V1, P243
  • [6] ParaKMeans: Implementation of a parallelized K-means algorithm suitable for general laboratory use
    Kraj, Piotr
    Sharma, Ashok
    Garge, Nikhil
    Podolsky, Robert
    McIndoe, Richard A.
    [J]. BMC BIOINFORMATICS, 2008, 9 (1)
  • [7] Macqueen J., 1967, 5 BERK S MATH STAT P, P281, DOI DOI 10.1007/S11665-016-2173-6
  • [8] Clustering Large Databases in Distributed Environment
    Pakhira, Malay K.
    [J]. 2009 IEEE INTERNATIONAL ADVANCE COMPUTING CONFERENCE, VOLS 1-3, 2009, : 351 - 358
  • [9] A two-phase K-means algorithm for large datasets
    Pham, DT
    Dimov, SS
    Nguyen, CD
    [J]. PROCEEDINGS OF THE INSTITUTION OF MECHANICAL ENGINEERS PART C-JOURNAL OF MECHANICAL ENGINEERING SCIENCE, 2004, 218 (10) : 1269 - 1273
  • [10] An incremental K-means algorithm
    Pham, DT
    Dimov, SS
    Nguyen, CD
    [J]. PROCEEDINGS OF THE INSTITUTION OF MECHANICAL ENGINEERS PART C-JOURNAL OF MECHANICAL ENGINEERING SCIENCE, 2004, 218 (07) : 783 - 795