Min-max kurtosis mean distance based k-means initial centroid initialization method for big genomic data clustering

被引：5

作者：

Pandey, Kamlesh Kumar ^{[1
]}

Shukla, Diwakar ^{[1
]}

机构：

[1] Dr Hari Singh Gour Vishwavidyalaya, Dept Comp Sci & Applicat, Sagar, Madhya Pradesh, India

来源：

EVOLUTIONARY INTELLIGENCE | 2023年 / 16卷 / 03期

关键词：

Big data clustering; Initial centroid algorithm; Genome clustering; Kurtosis clustering; Convergence speed; K-Means; GENETIC ALGORITHM;

D O I：

10.1007/s12065-022-00720-3

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Genomic clustering is a big data application that uses the K-means (KM) clustering approach to discover hidden patterns and trends in genes for disease diagnosis, biological analysis, and tissue detection. The KM algorithm is highly dependent on the initial centroid because it determines the effectiveness, efficiency, computing resources, and local optima of the KM clustering. The existing initial centroid initialization approach traps local optima due to randomization and achieves high computational cost due to the enormous interrelated dimension. Therefore, the KM algorithm produces the lowest quality cluster and maximizes the computation time and resource consumption. To address this issue, this study has presented the Min-Max Kurtosis Mean Distance (MKMD) algorithm for big data clustering in a single machine environment. The MKMD algorithm enhances the effectiveness and efficiency of the KM algorithm by measuring the distance between data points of the minimum-maximum kurtosis dimension and their mean. The performance of the presented algorithm has been compared against KM, KM + + , ADV, MKM, Mean-KM, NFD, K-MAM, NRKM2, FMNN and MuKM algorithms using internal and external effectiveness evaluation criteria with efficiency assessment on sixteen genomic datasets. The experimental results reveal that the MKMDKM algorithm minimizes iterations, distance computation, data comparison, local optima, resource consumption, and improves cluster performance, effectiveness and efficiency with stable convergence and results as compared to other algorithms. According to the statistical analysis, the proposed MKMDKM algorithm has achieved statistical significance by employing the Friedman test and the post hoc test.

引用

页码：1055 / 1076

页数：22

共 84 条

[1] ├a┬ayr├a┬nm├a┬A S., 2006, KNOWLEDGE MINING USI
[2] Abualigah L. M. Q., 2019, STUDIES COMPUTATIONA, V816, DOI DOI 10.1007/978-3-030-10674-4
[3] Improved slime mould algorithm by opposition-based learning and Levy flight distribution for global optimization and advances in real-world engineering problems
Abualigah, Laith
Diabat, Ali
Abd Elaziz, Mohamed
[J]. JOURNAL OF AMBIENT INTELLIGENCE AND HUMANIZED COMPUTING, 2021, 14 (2) : 1163 - 1202
[4] Aquila Optimizer: A novel meta-heuristic optimization algorithm
Abualigah, Laith
Yousri, Dalia
Abd Elaziz, Mohamed
Ewees, Ahmed A.
Al-qaness, Mohammed A. A.
Gandomi, Amir H.
[J]. COMPUTERS & INDUSTRIAL ENGINEERING, 2021, 157 (157)
[5] The Arithmetic Optimization Algorithm
Abualigah, Laith
Diabat, Ali
Mirjalili, Seyedali
Elaziz, Mohamed Abd
Gandomi, Amir H.
[J]. COMPUTER METHODS IN APPLIED MECHANICS AND ENGINEERING, 2021, 376
[6] Aggarwal CC, 2014, CH CRC DATA MIN KNOW, P1
[7] Efficient algorithm for big data clustering on single machine
Alguliyev, Rasim M.
Aliguliyev, Ramiz M.
Sukhostat, Lyudmila, V
[J]. CAAI TRANSACTIONS ON INTELLIGENCE TECHNOLOGY, 2020, 5 (01) : 9 - 14
[8] A hybrid reciprocal model of PCA and K-means with an innovative approach of considering sub-datasets for the improvement of K-means initialization and step-by-step labeling to create clusters with high interpretability
Anaraki, Seyed Alireza Mousavian
Haeri, Abdorrahman
Moslehi, Fateme
[J]. PATTERN ANALYSIS AND APPLICATIONS, 2021, 24 (03) : 1387 - 1402
[9] Arthur D, 2007, PROCEEDINGS OF THE EIGHTEENTH ANNUAL ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS, P1027
[10] Decreasing the execution time of reducers by revising clustering based on the futuristic greedy approach
Bakhthemmat, Ali
Izadi, Mohammad
[J]. JOURNAL OF BIG DATA, 2020, 7 (01)

← 1 2 3 4 5 6 7 8 9 →