A Performance Comparison of Big Data Processing Platform Based on Parallel Clustering Algorithms

被引：4

作者：

Hai, Mo ^{[1
,2
]}

Zhang, Yuejing ^{[1
]}

Li, Haifeng ^{[1
]}

机构：

[1] Cent Univ Finance & Econ, Sch Informat, Beijing 100081, Peoples R China

[2] Univ Elect Sci & Technol China, Network & Data Secur Key Lab Sichuan Prov, Chengdu 610054, Sichuan, Peoples R China

来源：

6TH INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY AND QUANTITATIVE MANAGEMENT | 2018年 / 139卷

关键词：

Hadoop; Spark; DataMPI; K-means; fuzzy K-means; Canopy; MAPREDUCE;

D O I：

10.1016/j.procs.2018.10.228

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

The performance of three typical big data processing platform: Hadoop, Spark and DataMPI are compared based on different parallel clustering algorithms: parallel K-means, parallel fuzzy K-means and parallel Canopy. Experiments are performed on different text as well as numeric dataset and clusters of different scale. The results show that: (1) for the same data set, when the memory of each node is 4GB, DataMPI can achieve about 60% performance improvement compared with Hadoop, and can achieve about 32% performance improvement compared with Spark; (2) in order to obtain a high clustering performance, a cluster with 6 nodes and 6GB memory of each node should be selected. (C) 2018 The Authors. Published by Elsevier B.V.

引用

页码：127 / 135

页数：9

共 13 条

[1]

[Anonymous], 2012, Login: The Usenix Magazine

[2]

[Anonymous], 2011, MCKINSEY DIGITAL

[3]

Dean J, 2004, USENIX ASSOCIATION PROCEEDINGS OF THE SIXTH SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION (OSDE '04), P137

[4] MapReduce: A Flexible Data Processing Tool [J].

Dean, Jeffrey ;

Ghemawat, Sanjay .

COMMUNICATIONS OF THE ACM, 2010, 53 (01) :72-77

[5]

Gantz J.F., 2010, The Digital Universe Decade - Are You Ready?

[6]

Gantz John., 2007, EXPANDING DIGITAL UN

[7]

He Jun, 2012, THESIS

[8]

Jianheng Lu, 2012, HADAOOP ACTION

[9]

Karau H., 2013, Fast Data Processing With Spark

[10]

Konstantin Shvachko, 2010, 26 S IEEE MASS STOR

← 1 2 →