A Data Science and Engineering Solution for Fast k-Means Clustering of Big Data

被引:39
|
作者
Dierckens, Karl E. [1 ]
Harrison, Adrian B. [1 ]
Leung, Carson K. [1 ]
Pind, Adrienne V. [1 ]
机构
[1] Univ Manitoba, Dept Comp Sci, Winnipeg, MB, Canada
来源
2017 16TH IEEE INTERNATIONAL CONFERENCE ON TRUST, SECURITY AND PRIVACY IN COMPUTING AND COMMUNICATIONS / 11TH IEEE INTERNATIONAL CONFERENCE ON BIG DATA SCIENCE AND ENGINEERING / 14TH IEEE INTERNATIONAL CONFERENCE ON EMBEDDED SOFTWARE AND SYSTEMS | 2017年
基金
加拿大自然科学与工程研究理事会;
关键词
Big data; data mining; clustering; k-means; IMPLEMENTATION; ALGORITHM;
D O I
10.1109/Trustcom/BigDataSE/ICESS.2017.332
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
With advances in technology, high volumes of a wide variety of valuable data of different veracity can be easily collected or generated at a high velocity in the current era of big data. Embedded in these big data arc implicit, previously unknown and potentially useful information and knowledge. Hence, fast and scalable big data science and engineering solutions that mine and discover knowledge from these big data are in demand. A popular and practical data mining task is to group similar data or objects into clusters (i.e., clustering). While k-means clustering is popular and leads good-quality results, its associated algorithms may suffer from a few problems (e.g., risks associated with randomly selected k representatives, tendency to produce spherical clusters, high runtime complexity). To deal with these problems, we present in this paper a fast big data science and engineering solution that applies a fast k-means clustering heuristic for grouping similar big data objects. Evaluation results show the efficiency and scalability of our solution in k-means clustering of big data.
引用
收藏
页码:925 / 932
页数:8
相关论文
共 50 条
  • [41] A Fast and Scalable FPGA-Based Parallel Processing Architecture for K-Means Clustering for Big Data Analysis
    Raghavan, Ramprasad
    Perera, Darshika G.
    2017 IEEE PACIFIC RIM CONFERENCE ON COMMUNICATIONS, COMPUTERS AND SIGNAL PROCESSING (PACRIM), 2017,
  • [42] Bagged K-means clustering of metabolome data
    Hageman, J. A.
    van den Berg, R. A.
    Westerhuis, J. A.
    Hoefsloot, H. C. J.
    Smilde, A. K.
    CRITICAL REVIEWS IN ANALYTICAL CHEMISTRY, 2006, 36 (3-4) : 211 - 220
  • [43] k-POD: A Method for k-Means Clustering of Missing Data
    Chi, Jocelyn T.
    Chi, Eric C.
    Baraniuk, Richard G.
    AMERICAN STATISTICIAN, 2016, 70 (01) : 91 - 99
  • [44] Fast Adaptive K-Means Subspace Clustering for High-Dimensional Data
    Wang, Xiao-Dong
    Chen, Rung-Ching
    Yan, Fei
    Zeng, Zhi-Qiang
    Hong, Chao-Qun
    IEEE ACCESS, 2019, 7 : 42639 - 42651
  • [45] A Novel Clustering Algorithm for Big Data: K-Means-Fuzzy C Means
    Manikandan, A.
    Danapaquiame, N.
    Gayathri, R.
    Kodhai, E.
    Amudhavel, J.
    BIOSCIENCE BIOTECHNOLOGY RESEARCH COMMUNICATIONS, 2018, 11 (01): : 85 - 93
  • [46] An Effective Method Determining the Initial Cluster Centers for K-means for Clustering Gene Expression Data
    Tanir, Deniz
    Nuriyeva, Fidan
    2017 INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND ENGINEERING (UBMK), 2017, : 751 - 754
  • [47] Analysis and Visualization of Twitter Data using k-means Clustering
    Garg, Neha
    Rani, Rinkle
    2017 INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTING AND CONTROL SYSTEMS (ICICCS), 2017, : 670 - 675
  • [48] An evolutionary K-means algorithm for clustering time series data
    Zhang, H
    Ho, TB
    Lin, MS
    PROCEEDINGS OF THE 2004 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-7, 2004, : 1282 - 1287
  • [49] ADAPTIVE USAGE OF K-MEANS IN EVOLUTIONARY OPTIMIZED DATA CLUSTERING
    Wang, Xi
    Sheng, Weiguo
    PROCEEDINGS OF 2017 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS (ICMLC), VOL 1, 2017, : 15 - 20
  • [50] Fast K-means for Large Scale Clustering
    Hu, Qinghao
    Wu, Jiaxiang
    Bai, Lu
    Zhang, Yifan
    Cheng, Jian
    CIKM'17: PROCEEDINGS OF THE 2017 ACM CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, 2017, : 2099 - 2102