A Data Science and Engineering Solution for Fast k-Means Clustering of Big Data

被引:39
|
作者
Dierckens, Karl E. [1 ]
Harrison, Adrian B. [1 ]
Leung, Carson K. [1 ]
Pind, Adrienne V. [1 ]
机构
[1] Univ Manitoba, Dept Comp Sci, Winnipeg, MB, Canada
来源
2017 16TH IEEE INTERNATIONAL CONFERENCE ON TRUST, SECURITY AND PRIVACY IN COMPUTING AND COMMUNICATIONS / 11TH IEEE INTERNATIONAL CONFERENCE ON BIG DATA SCIENCE AND ENGINEERING / 14TH IEEE INTERNATIONAL CONFERENCE ON EMBEDDED SOFTWARE AND SYSTEMS | 2017年
基金
加拿大自然科学与工程研究理事会;
关键词
Big data; data mining; clustering; k-means; IMPLEMENTATION; ALGORITHM;
D O I
10.1109/Trustcom/BigDataSE/ICESS.2017.332
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
With advances in technology, high volumes of a wide variety of valuable data of different veracity can be easily collected or generated at a high velocity in the current era of big data. Embedded in these big data arc implicit, previously unknown and potentially useful information and knowledge. Hence, fast and scalable big data science and engineering solutions that mine and discover knowledge from these big data are in demand. A popular and practical data mining task is to group similar data or objects into clusters (i.e., clustering). While k-means clustering is popular and leads good-quality results, its associated algorithms may suffer from a few problems (e.g., risks associated with randomly selected k representatives, tendency to produce spherical clusters, high runtime complexity). To deal with these problems, we present in this paper a fast big data science and engineering solution that applies a fast k-means clustering heuristic for grouping similar big data objects. Evaluation results show the efficiency and scalability of our solution in k-means clustering of big data.
引用
收藏
页码:925 / 932
页数:8
相关论文
共 50 条
  • [31] k-Means Clustering of Asymmetric Data
    Olszewski, Dominik
    HYBRID ARTIFICIAL INTELLIGENT SYSTEMS, PT I, 2012, 7208 : 243 - 254
  • [32] Clustering of Image Data Using K-Means and Fuzzy K-Means
    Rahmani, Md. Khalid Imam
    Pal, Naina
    Arora, Kamiya
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2014, 5 (07) : 160 - 163
  • [33] Enhancement of the K-Means Algorithm for Mixed Data in Big Data Platforms
    Koren, Oded
    Hallin, Carina Antonia
    Perel, Nir
    Bendet, Dror
    INTELLIGENT SYSTEMS AND APPLICATIONS, VOL 1, 2019, 868 : 1025 - 1040
  • [34] An Improvement to the K-means Algorithm Oriented to Big Data
    Perez Ortega, Joaquin
    Rodolfo Pazos, R.
    Hidalgo, Miguel
    Almanza, Nelva
    Diaz-Parra, Ocotlan
    Santaolaya, Rene
    Caballero, Vitervo
    PROCEEDINGS OF THE INTERNATIONAL CONFERENCE OF NUMERICAL ANALYSIS AND APPLIED MATHEMATICS 2014 (ICNAAM-2014), 2015, 1648
  • [35] TURNING BIG DATA INTO TINY DATA: CONSTANT-SIZE CORESETS FOR k-MEANS, PCA, AND PROJECTIVE CLUSTERING
    Feldman, Dan
    Schmidt, Melanie
    Sohler, Christian
    SIAM JOURNAL ON COMPUTING, 2020, 49 (03) : 601 - 657
  • [36] IMPROVEMENT IN K-MEANS CLUSTERING ALGORITHM FOR DATA CLUSTERING
    Rajeswari, K.
    Acharya, Omkar
    Sharma, Mayur
    Kopnar, Mahesh
    Karandikar, Kiran
    1ST INTERNATIONAL CONFERENCE ON COMPUTING COMMUNICATION CONTROL AND AUTOMATION ICCUBEA 2015, 2015, : 367 - 369
  • [37] Combined Elephant Herding Optimization Algorithm with K-means for Data Clustering
    Tuba, Eva
    Dolicanin-Djekic, Diana
    Jovanovic, Raka
    Simian, Dana
    Tuba, Milan
    INFORMATION AND COMMUNICATION TECHNOLOGY FOR INTELLIGENT SYSTEMS, ICTIS 2018, VOL 2, 2019, 107 : 665 - 673
  • [38] Efficient and Privacy-Preserving k-means clustering For Big Data Mining
    Gheid, Zakaria
    Challal, Yacine
    2016 IEEE TRUSTCOM/BIGDATASE/ISPA, 2016, : 791 - 798
  • [39] Big Data Clustering Analysis Algorithm for Internet of Things Based on K-Means
    Yu, Zhanqiu
    INTERNATIONAL JOURNAL OF DISTRIBUTED SYSTEMS AND TECHNOLOGIES, 2019, 10 (01) : 1 - 12
  • [40] Analysis of K-Means and K-Medoids Algorithm For Big Data
    Arora, Preeti
    Deepali
    Varshney, Shipra
    1ST INTERNATIONAL CONFERENCE ON INFORMATION SECURITY & PRIVACY 2015, 2016, 78 : 507 - 512