Large-Scale Automatic K-Means Clustering for Heterogeneous Many-Core Supercomputer

被引:4
|
作者
Yu, Teng [1 ]
Zhao, Wenlai [2 ,3 ]
Liu, Pan [2 ,3 ]
Janjic, Vladimir [1 ]
Yan, Xiaohan [4 ]
Wang, Shicai [5 ]
Fu, Haohuan [2 ,3 ]
Yang, Guangwen [2 ,3 ]
Thomson, John [1 ]
机构
[1] Univ St Andrews, St Andrews KY16 9AJ, Fife, Scotland
[2] Tsinghua Univ, Beijing 100084, Peoples R China
[3] Natl Supercomp Ctr, Wuxi 214072, Jiangsu, Peoples R China
[4] Univ Calif Berkeley, Berkeley, CA 94720 USA
[5] Wellcome Trust Sanger Inst, Saffron Walden CB10 1SA, Essex, England
基金
英国工程与自然科学研究理事会; 国家重点研发计划; 中国博士后科学基金;
关键词
Supercomputer; heterogeneous many-core processor; data partitioning; clustering; scheduling; AutoML; ALGORITHM;
D O I
10.1109/TPDS.2019.2955467
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
This article presents an automatic k-means clustering solution targeting the Sunway TaihuLight supercomputer. We first introduce a multilevel parallel partition approach that not only partitions by dataflow and centroid, but also by dimension, which unlocks the potential of the hierarchical parallelism in the heterogeneous many-core processor and the system architecture of the supercomputer. The parallel design is able to process large-scale clustering problems with up to 196,608 dimensions and over 160,000 targeting centroids, while maintaining high performance and high scalability. Furthermore, we propose an automatic hyper-parameter determination process for k-means clustering, by automatically generating and executing the clustering tasks with a set of candidate hyper-parameter, and then determining the optimal hyper-parameter using a proposed evaluation method. The proposed auto-clustering solution can not only achieve high performance and scalability for problems with massive high-dimensional data, but also support clustering without sufficient prior knowledge for the number of targeted clusters, which can potentially increase the scope of k-means algorithm to new application areas.
引用
收藏
页码:997 / 1008
页数:12
相关论文
共 50 条
  • [1] Large-Scale Hierarchical k-means for Heterogeneous Many-Core Supercomputers
    Li, Liandeng
    Yu, Teng
    Zhao, Wenlai
    Fu, Haohuan
    Wang, Chenyu
    Tan, Li
    Yang, Guangwen
    Thomson, John
    PROCEEDINGS OF THE INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE, AND ANALYSIS (SC'18), 2018,
  • [2] Scalable k-means for large-scale clustering
    Ming, Yuewei
    Zhu, En
    Wang, Mao
    Liu, Qiang
    Liu, Xinwang
    Yin, Jianping
    INTELLIGENT DATA ANALYSIS, 2019, 23 (04) : 825 - 838
  • [3] Compressed K-Means for Large-Scale Clustering
    Shen, Xiaobo
    Liu, Weiwei
    Tsang, Ivor
    Shen, Fumin
    Sun, Quan-Sen
    THIRTY-FIRST AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 2527 - 2533
  • [4] Large-scale k-means clustering via variance reduction
    Zhao, Yawei
    Ming, Yuewei
    Liu, Xinwang
    Zhu, En
    Zhao, Kaikai
    Yin, Jianping
    NEUROCOMPUTING, 2018, 307 : 184 - 194
  • [5] Optimizing Yinyang K-means algorithm on many-core CPUs
    Zhou T.
    Wang Q.
    Li R.
    Mei S.
    Yin S.
    Hao R.
    Liu J.
    Guofang Keji Daxue Xuebao/Journal of National University of Defense Technology, 2024, 46 (01): : 93 - 102
  • [6] Large-Scale Molecular Dynamics Simulation Based on Heterogeneous Many-Core Architecture
    Zhou, Xu
    Wei, Zhiqiang
    Lu, Hao
    He, Jiaqi
    Gao, Yuan
    Hu, Xiaotong
    Wang, Cunji
    Dong, Yujie
    Liu, Hao
    JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2024, 64 (03) : 851 - 861
  • [7] Regularized and Sparse Stochastic K-Means for Distributed Large-Scale Clustering
    Jumutc, Vilen
    Langone, Rocco
    Suykens, Johan A. K.
    PROCEEDINGS 2015 IEEE INTERNATIONAL CONFERENCE ON BIG DATA, 2015, : 2535 - 2540
  • [8] Fast K-means for Large Scale Clustering
    Hu, Qinghao
    Wu, Jiaxiang
    Bai, Lu
    Zhang, Yifan
    Cheng, Jian
    CIKM'17: PROCEEDINGS OF THE 2017 ACM CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, 2017, : 2099 - 2102
  • [9] Efficient Large-Scale Virtual Screening Based on Heterogeneous Many-Core Supercomputing System
    Liu, Hao
    Wang, Cunji
    Liu, Peng
    Liu, Chengchao
    Wang, Zhuoya
    Wei, Zhiqiang
    IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2023, 27 (07) : 3579 - 3588
  • [10] Automatic Code Generation and Optimization of Large-scale Stencil Computation on Many-core Processors
    Li, Mingzhen
    Liu, Yi
    Yang, Hailong
    Hu, Yongmin
    Sun, Qingxiao
    Chen, Bangduo
    You, Xin
    Liu, Xiaoyan
    Luan, Zhongzhi
    Qian, Depei
    50TH INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING, 2021,