Min-max kurtosis mean distance based k-means initial centroid initialization method for big genomic data clustering

被引:5
作者
Pandey, Kamlesh Kumar [1 ]
Shukla, Diwakar [1 ]
机构
[1] Dr Hari Singh Gour Vishwavidyalaya, Dept Comp Sci & Applicat, Sagar, Madhya Pradesh, India
关键词
Big data clustering; Initial centroid algorithm; Genome clustering; Kurtosis clustering; Convergence speed; K-Means; GENETIC ALGORITHM;
D O I
10.1007/s12065-022-00720-3
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Genomic clustering is a big data application that uses the K-means (KM) clustering approach to discover hidden patterns and trends in genes for disease diagnosis, biological analysis, and tissue detection. The KM algorithm is highly dependent on the initial centroid because it determines the effectiveness, efficiency, computing resources, and local optima of the KM clustering. The existing initial centroid initialization approach traps local optima due to randomization and achieves high computational cost due to the enormous interrelated dimension. Therefore, the KM algorithm produces the lowest quality cluster and maximizes the computation time and resource consumption. To address this issue, this study has presented the Min-Max Kurtosis Mean Distance (MKMD) algorithm for big data clustering in a single machine environment. The MKMD algorithm enhances the effectiveness and efficiency of the KM algorithm by measuring the distance between data points of the minimum-maximum kurtosis dimension and their mean. The performance of the presented algorithm has been compared against KM, KM + + , ADV, MKM, Mean-KM, NFD, K-MAM, NRKM2, FMNN and MuKM algorithms using internal and external effectiveness evaluation criteria with efficiency assessment on sixteen genomic datasets. The experimental results reveal that the MKMDKM algorithm minimizes iterations, distance computation, data comparison, local optima, resource consumption, and improves cluster performance, effectiveness and efficiency with stable convergence and results as compared to other algorithms. According to the statistical analysis, the proposed MKMDKM algorithm has achieved statistical significance by employing the Friedman test and the post hoc test.
引用
收藏
页码:1055 / 1076
页数:22
相关论文
共 84 条
  • [1] ├a┬ayr├a┬nm├a┬A S., 2006, KNOWLEDGE MINING USI
  • [2] Abualigah L. M. Q., 2019, STUDIES COMPUTATIONA, V816, DOI DOI 10.1007/978-3-030-10674-4
  • [3] Improved slime mould algorithm by opposition-based learning and Levy flight distribution for global optimization and advances in real-world engineering problems
    Abualigah, Laith
    Diabat, Ali
    Abd Elaziz, Mohamed
    [J]. JOURNAL OF AMBIENT INTELLIGENCE AND HUMANIZED COMPUTING, 2021, 14 (2) : 1163 - 1202
  • [4] Aquila Optimizer: A novel meta-heuristic optimization algorithm
    Abualigah, Laith
    Yousri, Dalia
    Abd Elaziz, Mohamed
    Ewees, Ahmed A.
    Al-qaness, Mohammed A. A.
    Gandomi, Amir H.
    [J]. COMPUTERS & INDUSTRIAL ENGINEERING, 2021, 157 (157)
  • [5] The Arithmetic Optimization Algorithm
    Abualigah, Laith
    Diabat, Ali
    Mirjalili, Seyedali
    Elaziz, Mohamed Abd
    Gandomi, Amir H.
    [J]. COMPUTER METHODS IN APPLIED MECHANICS AND ENGINEERING, 2021, 376
  • [6] Aggarwal CC, 2014, CH CRC DATA MIN KNOW, P1
  • [7] Efficient algorithm for big data clustering on single machine
    Alguliyev, Rasim M.
    Aliguliyev, Ramiz M.
    Sukhostat, Lyudmila, V
    [J]. CAAI TRANSACTIONS ON INTELLIGENCE TECHNOLOGY, 2020, 5 (01) : 9 - 14
  • [8] A hybrid reciprocal model of PCA and K-means with an innovative approach of considering sub-datasets for the improvement of K-means initialization and step-by-step labeling to create clusters with high interpretability
    Anaraki, Seyed Alireza Mousavian
    Haeri, Abdorrahman
    Moslehi, Fateme
    [J]. PATTERN ANALYSIS AND APPLICATIONS, 2021, 24 (03) : 1387 - 1402
  • [9] Arthur D, 2007, PROCEEDINGS OF THE EIGHTEENTH ANNUAL ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS, P1027
  • [10] Decreasing the execution time of reducers by revising clustering based on the futuristic greedy approach
    Bakhthemmat, Ali
    Izadi, Mohammad
    [J]. JOURNAL OF BIG DATA, 2020, 7 (01)