Min-max kurtosis stratum mean: An improved K-means cluster initialization approach for microarray gene clustering on multidimensional big data

被引:3
作者
Pandey, Kamlesh Kumar [1 ]
Shukla, Diwakar [1 ]
机构
[1] Dr Hari Singh Gour Vishwavidyalaya, Dept Comp Sci & Applicat, Sagar, Madhya Pradesh, India
关键词
big data clustering; gene clustering; initial centroid; K-means; microarray clustering; multidimensional clustering; EXPRESSION DATA; MEANS ALGORITHM; EVOLUTION; SELECTION; SIZE;
D O I
10.1002/cpe.7185
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Microarray gene clustering is a big data application that employs the K-means (KM) clustering algorithm to identify hidden patterns, evolutionary relationships, unknown functions and gene trends for disease diagnosis, tissue detection and biological analysis. The selection of initial centroids is a major issue in the KM algorithm because it influences the effectiveness, efficiency and local optima of the cluster. The existing initial centroid initialization algorithm is computationally expensive and degrades cluster quality due to the large dimensionality and interconnectedness of microarray gene data. To deal with this issue, this study proposed the min-max kurtosis stratum mean (MKSM) algorithm for big data clustering in a single machine environment. The MKSM algorithm uses kurtosis for dimension selection, mean distance for gene relationship identification, and stratification for heterogeneous centroid extraction. The results of the presented algorithm are compared to the state-of-the-art initialization strategy on twelve microarray gene datasets utilizing internal, external and statistical assessment criteria. The experimental results demonstrate that the MKSMKM algorithm reduces iterations, distance computation, data comparison and local optima, and improves cluster performance, effectiveness and efficiency with stable convergence.
引用
收藏
页数:33
相关论文
共 102 条
  • [1] ├a┬ayr├a┬nm├a┬A S., 2006, KNOWLEDGE MINING USI
  • [2] Aggarwal CC, 2014, CH CRC DATA MIN KNOW, P1
  • [3] Aguirre JCR, 2017, CLUSTERING HIGH DIME
  • [4] Al-Daoud MB, 2005, PROC WRLD ACAD SCI E, V4, P74
  • [5] Arthur D, 2007, PROCEEDINGS OF THE EIGHTEENTH ANNUAL ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS, P1027
  • [6] Decreasing the execution time of reducers by revising clustering based on the futuristic greedy approach
    Bakhthemmat, Ali
    Izadi, Mohammad
    [J]. JOURNAL OF BIG DATA, 2020, 7 (01)
  • [7] DUCF: Distributed load balancing Unequal Clustering in wireless sensor networks using Fuzzy approach
    Baranidharan, B.
    Santhi, B.
    [J]. APPLIED SOFT COMPUTING, 2016, 40 : 495 - 506
  • [8] Minimization subproblems and heuristics for an applied clustering problem
    Birgin, EG
    Martínez, JM
    Ronconi, DP
    [J]. EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, 2003, 146 (01) : 19 - 34
  • [9] Knowledge-based analysis of microarray gene expression data by using support vector machines
    Brown, MPS
    Grundy, WN
    Lin, D
    Cristianini, N
    Sugnet, CW
    Furey, TS
    Ares, M
    Haussler, D
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2000, 97 (01) : 262 - 267
  • [10] An initialization method for the K-Means algorithm using neighborhood model
    Cao, Fuyuan
    Liang, Jiye
    Jiang, Guang
    [J]. COMPUTERS & MATHEMATICS WITH APPLICATIONS, 2009, 58 (03) : 474 - 483