NIFTI: An evolutionary approach for finding number of clusters in microarray data

被引:4
|
作者
Jonnalagadda, Sudhakar [1 ]
Srinivasan, Rajagopalan [1 ]
机构
[1] Natl Univ Singapore, Dept Chem & Biomol Engn, Singapore 119260, Singapore
来源
BMC BIOINFORMATICS | 2009年 / 10卷
关键词
GENE-EXPRESSION DATA; VALIDATION TECHNIQUES; DATA SET; PATTERNS; VALIDITY; MODEL;
D O I
10.1186/1471-2105-10-40
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Clustering techniques are routinely used in gene expression data analysis to organize the massive data. Clustering techniques arrange a large number of genes or assays into a few clusters while maximizing the intra-cluster similarity and inter-cluster separation. While clustering of genes facilitates learning the functions of un-characterized genes using their association with known genes, clustering of assays reveals the disease stages and subtypes. Many clustering algorithms require the user to specify the number of clusters a priori. A wrong specification of number of clusters generally leads to either failure to detect novel clusters (disease subtypes) or unnecessary splitting of natural clusters. Results: We have developed a novel method to find the number of clusters in gene expression data. Our procedure evaluates different partitions (each with different number of clusters) from the clustering algorithm and finds the partition that best describes the data. In contrast to the existing methods that evaluate the partitions independently, our procedure considers the dynamic rearrangement of cluster members when a new cluster is added. Partition quality is measured based on a new index called Net InFormation Transfer Index (NIFTI) that measures the information change when an additional cluster is introduced. Information content of a partition increases when clusters do not intersect and decreases if they are not clearly separated. A partition with the highest Total Information Content (TIC) is selected as the optimal one. We illustrate our method using four publicly available microarray datasets. Conclusion: In all four case studies, the proposed method correctly identified the number of clusters and performs better than other well known methods. Our method also showed invariance to the clustering techniques.
引用
收藏
页数:13
相关论文
共 50 条
  • [1] An evolutionary algorithm for clustering data streams with a variable number of clusters
    Silva, Jonathan de Andrade
    Hruschka, Eduardo Raul
    Gama, Joao
    EXPERT SYSTEMS WITH APPLICATIONS, 2017, 67 : 228 - 238
  • [2] MCS: A Method for Finding the Number of Clusters
    Albatineh, Ahmed N.
    Niewiadomska-Bugaj, Magdalena
    JOURNAL OF CLASSIFICATION, 2011, 28 (02) : 184 - 209
  • [3] Finding the number of clusters in ordered dissimilarities
    Sledge, Isaac J.
    Havens, Timothy C.
    Huband, Jacalyn M.
    Bezdek, James C.
    Keller, James M.
    SOFT COMPUTING, 2009, 13 (12) : 1125 - 1142
  • [4] Evolutionary biclustering algorithms: an experimental study on microarray data
    Maatouk, Ons
    Ayadi, Wassim
    Bouziri, Hend
    Duval, Beatrice
    SOFT COMPUTING, 2019, 23 (17) : 7671 - 7697
  • [5] Selection of the number of clusters in functional data analysis
    Zambom, Adriano Zanin
    Alfonso Collazos, Julian
    Dias, Ronaldo
    JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION, 2022, 92 (14) : 2980 - 2998
  • [6] Feature Selection and Classification by using Grid Computing based Evolutionary Approach for the Microarray Data
    Chen, T. -C.
    Hsieh, Y. -C.
    You, P. -S.
    Lee, Y. -C.
    PROCEEDINGS OF 2010 3RD IEEE INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND INFORMATION TECHNOLOGY, VOL 9 (ICCSIT 2010), 2010, : 85 - 89
  • [7] BiHEA: A Hybrid Evolutionary Approach for Microarray Biclustering
    Andres Gallo, Cristian
    Andrea Carballido, Jessica
    Ponzoni, Ignacio
    ADVANCES IN BIOINFORMATICS AND COMPUTATIONAL BIOLOGY, PROCEEDINGS, 2009, 5676 : 36 - 47
  • [8] A New Evolutionary Ensemble Learning of Multimodal Feature Selection from Microarray Data
    Nekouie, Nadia
    Romoozi, Morteza
    Esmaeili, Mahdi
    NEURAL PROCESSING LETTERS, 2023, 55 (05) : 6753 - 6780
  • [9] A NEW APPROACH FOR DETERMINING NUMBER OF CLUSTERS
    Erisoglu, Murat
    Erisoglu, Ulku
    Servi, Tayfun
    Sakallioglu, Sadullah
    PAKISTAN JOURNAL OF STATISTICS, 2012, 28 (01): : 141 - 158
  • [10] Determine the number of clusters by data augmentation
    Luo, Wei
    ELECTRONIC JOURNAL OF STATISTICS, 2022, 16 (02): : 3910 - 3936