Big Data Clustering: A Review

被引:0
作者
Shirkhorshidi, Ali Seyed [1 ]
Aghabozorgi, Saeed [1 ]
Teh, Ying Wah [1 ]
Herawan, Tutut [1 ]
机构
[1] Univ Malaya, Fac Comp Sci & Informat Technol, Dept Informat Syst, Kuala Lumpur 50603, Malaysia
来源
COMPUTATIONAL SCIENCE AND ITS APPLICATIONS - ICCSA 2014, PT V | 2014年 / 8583卷
关键词
Big Data; Clustering; MapReduce; Parallel Clustering; WAY PARTITIONING SCHEME; EXCEPTION RULES; ALGORITHM; DBSCAN; FUZZY;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Clustering is an essential data mining and tool for analyzing big data. There are difficulties for applying clustering techniques to big data duo to new challenges that are raised with big data. As Big Data is referring to terabytes and petabytes of data and clustering algorithms are come with high computational costs, the question is how to cope with this problem and how to deploy clustering techniques to big data and get the results in a reasonable time. This study is aimed to review the trend and progress of clustering algorithms to cope with big data challenges from very first proposed algorithms until today's novel solutions. The algorithms and the targeted challenges for producing improved clustering algorithms are introduced and analyzed, and afterward the possible future path for more advanced algorithms is illuminated based on today's available technologies and frameworks.
引用
收藏
页码:707 / 720
页数:14
相关论文
共 37 条
[1]   Fast computation of low-rank matrix approximations [J].
Achlioptas, Dimitris ;
McSherry, Frank .
JOURNAL OF THE ACM, 2007, 54 (02)
[2]  
Aggarwal CC, 2014, CH CRC DATA MIN KNOW, P1
[3]  
Anchalia P. P., 2013, Information Science and Applications (ICISA), 2013 International Conference on, P1, DOI [10.1109/icisa.2013.6579448, DOI 10.1109/ICISA.2013.6579448]
[4]   G-DBSCAN: A GPU Accelerated Algorithm for Density-based Clustering [J].
Andrade, Guilherme ;
Ramos, Gabriel ;
Madeira, Daniel ;
Sachetto, Rafael ;
Ferreira, Renato ;
Rocha, Leonardo .
2013 INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE, 2013, 18 :369-378
[5]  
[Anonymous], 2014, YOUTUBE STAT
[6]  
Ashrafi M.Z, 2007, Int. J. Bus. Intell. Data Min., V2, P29
[7]  
Boutsidis C., 2010, ADV NEURAL INFORM PR, V23, P298
[8]  
Daly O, 2004, LECT NOTES COMPUT SC, V3046, P543
[9]  
Dasgupta S., 2000, P 16 C UNCERTAINTY A, P143, DOI [10.5555/647234.719759, DOI 10.5555/647234.719759]
[10]   Fast Monte Carlo algorithms for matrices I: Approximating matrix multiplication [J].
Drineas, Petros ;
Kannan, Ravi ;
Mahoney, Michael W. .
SIAM JOURNAL ON COMPUTING, 2006, 36 (01) :132-157