Min-max kurtosis stratum mean: An improved K-means cluster initialization approach for microarray gene clustering on multidimensional big data

被引:6
作者
Pandey, Kamlesh Kumar [1 ]
Shukla, Diwakar [1 ]
机构
[1] Dr Hari Singh Gour Vishwavidyalaya, Dept Comp Sci & Applicat, Sagar, Madhya Pradesh, India
关键词
big data clustering; gene clustering; initial centroid; K-means; microarray clustering; multidimensional clustering; EXPRESSION DATA; MEANS ALGORITHM; EVOLUTION; SELECTION; SIZE;
D O I
10.1002/cpe.7185
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Microarray gene clustering is a big data application that employs the K-means (KM) clustering algorithm to identify hidden patterns, evolutionary relationships, unknown functions and gene trends for disease diagnosis, tissue detection and biological analysis. The selection of initial centroids is a major issue in the KM algorithm because it influences the effectiveness, efficiency and local optima of the cluster. The existing initial centroid initialization algorithm is computationally expensive and degrades cluster quality due to the large dimensionality and interconnectedness of microarray gene data. To deal with this issue, this study proposed the min-max kurtosis stratum mean (MKSM) algorithm for big data clustering in a single machine environment. The MKSM algorithm uses kurtosis for dimension selection, mean distance for gene relationship identification, and stratification for heterogeneous centroid extraction. The results of the presented algorithm are compared to the state-of-the-art initialization strategy on twelve microarray gene datasets utilizing internal, external and statistical assessment criteria. The experimental results demonstrate that the MKSMKM algorithm reduces iterations, distance computation, data comparison and local optima, and improves cluster performance, effectiveness and efficiency with stable convergence.
引用
收藏
页数:33
相关论文
共 102 条
[51]  
Kalyanakrishnan S., 2017, K-Means Clustering, V1, P3
[52]   An improved K means clustering with Atkinson index to classify liver patient dataset [J].
Kant, Surya ;
Ansari, Irshad Ahmad .
INTERNATIONAL JOURNAL OF SYSTEM ASSURANCE ENGINEERING AND MANAGEMENT, 2016, 7 (01) :222-228
[53]   Tight clustering for large datasets with an application to gene expression data [J].
Karmakar, Bikram ;
Das, Sarmistha ;
Bhattacharya, Sohom ;
Sarkar, Rohan ;
Mukhopadhyay, Indranil .
SCIENTIFIC REPORTS, 2019, 9 (1)
[54]   Predicting l-CrossSold products using connected components: A clustering-based recommendation system [J].
Kashef, Rasha ;
Pun, Hubert .
ELECTRONIC COMMERCE RESEARCH AND APPLICATIONS, 2022, 53
[55]  
Kazemi Abolfazl, 2018, International Journal of Data Analysis Techniques and Strategies, V10, P291
[56]   A genetic algorithm approach to determine stratum boundaries and sample sizes of each stratum in stratified sampling [J].
Keskintuerk, Timur ;
Er, Sebnem .
COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2007, 52 (01) :53-67
[57]   An improved overlapping k-means clustering method for medical applications [J].
Khanmohammadi, Sina ;
Adibeig, Naiier ;
Shanehbandy, Samaneh .
EXPERT SYSTEMS WITH APPLICATIONS, 2017, 67 :12-18
[58]  
Khondoker MR, 2018, WILEY STATSREF STAT, P1, DOI [10.1002/9781118445112.stat07978, DOI 10.1002/9781118445112.STAT07978]
[59]   Effect of data normalization on fuzzy clustering of DNA microarray data [J].
Kim, SY ;
Lee, JW ;
Bae, JS .
BMC BIOINFORMATICS, 2006, 7 (1)
[60]   A novel differential evolution based clustering algorithm for wireless sensor networks [J].
Kuila, Pratyay ;
Jana, Prasanta K. .
APPLIED SOFT COMPUTING, 2014, 25 :414-425