NIFTI: An evolutionary approach for finding number of clusters in microarray data

被引:4
|
作者
Jonnalagadda, Sudhakar [1 ]
Srinivasan, Rajagopalan [1 ]
机构
[1] Natl Univ Singapore, Dept Chem & Biomol Engn, Singapore 119260, Singapore
来源
BMC BIOINFORMATICS | 2009年 / 10卷
关键词
GENE-EXPRESSION DATA; VALIDATION TECHNIQUES; DATA SET; PATTERNS; VALIDITY; MODEL;
D O I
10.1186/1471-2105-10-40
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Clustering techniques are routinely used in gene expression data analysis to organize the massive data. Clustering techniques arrange a large number of genes or assays into a few clusters while maximizing the intra-cluster similarity and inter-cluster separation. While clustering of genes facilitates learning the functions of un-characterized genes using their association with known genes, clustering of assays reveals the disease stages and subtypes. Many clustering algorithms require the user to specify the number of clusters a priori. A wrong specification of number of clusters generally leads to either failure to detect novel clusters (disease subtypes) or unnecessary splitting of natural clusters. Results: We have developed a novel method to find the number of clusters in gene expression data. Our procedure evaluates different partitions (each with different number of clusters) from the clustering algorithm and finds the partition that best describes the data. In contrast to the existing methods that evaluate the partitions independently, our procedure considers the dynamic rearrangement of cluster members when a new cluster is added. Partition quality is measured based on a new index called Net InFormation Transfer Index (NIFTI) that measures the information change when an additional cluster is introduced. Information content of a partition increases when clusters do not intersect and decreases if they are not clearly separated. A partition with the highest Total Information Content (TIC) is selected as the optimal one. We illustrate our method using four publicly available microarray datasets. Conclusion: In all four case studies, the proposed method correctly identified the number of clusters and performs better than other well known methods. Our method also showed invariance to the clustering techniques.
引用
收藏
页数:13
相关论文
共 50 条
  • [31] Estimating the number of clusters in multivariate data by various fittings of the L-curve
    Moustafa, Rida
    Hadi, Ali S.
    COMPUTATIONAL & APPLIED MATHEMATICS, 2025, 44 (01):
  • [32] Fuzzy Clustering to Identify Clusters at Different Levels of Fuzziness: An Evolutionary Multiobjective Optimization Approach
    Gupta, Avisek
    Datta, Shounak
    Das, Swagatam
    IEEE TRANSACTIONS ON CYBERNETICS, 2021, 51 (05) : 2601 - 2611
  • [33] Detecting the Maximum Similarity Bi-Clusters of Gene Expression Data with Evolutionary Computation
    Peng, Xinjuan
    Cai, Lijun
    Liao, Bo
    Chen, Haowen
    Zhu, Wen
    JOURNAL OF COMPUTATIONAL AND THEORETICAL NANOSCIENCE, 2014, 11 (07) : 1585 - 1591
  • [34] KERNEL-BASED PARAMETRIC VALIDITY INDEX FOR ASSESSING CLUSTERS FROM MICROARRAY GENE EXPRESSION DATA
    Fa, Rui
    Nandi, Asoke K.
    2012 IEEE INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING (MLSP), 2012,
  • [35] A Novel Approach for Discovering Overlapping Clusters in Gene Expression Data
    Ma, Patrick C. H.
    Chan, Keith C. C.
    IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, 2009, 56 (07) : 1803 - 1809
  • [36] BIOLOGICAL ANALYSIS OF MICROARRAY DATA USING ORTHOGONAL FORWARD SELECTION WITH A CLUSTERING APPROACH
    Kah, Wong Sou
    Moorthy, Kohbalan
    Mohamad, Mohd Saberi
    Kasim, Shahreen
    Deris, Safaai
    Omatu, Sigeru
    Yoshioka, Michifumi
    JOURNAL OF BIOLOGICAL SYSTEMS, 2015, 23 (02) : 275 - 288
  • [37] A New Approach for Feature Selection from Microarray Data Based on Mutual Information
    Tang, Jian
    Zhou, Shuigeng
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2016, 13 (06) : 1004 - 1015
  • [38] A novel fuzzy clustering approach to regionalise watersheds with an automatic determination of optimal number of clusters
    Senent-Aparicio, Javier
    Soto, Jesus
    Perez-Sanchez, Julio
    Garrido, Jorge
    JOURNAL OF HYDROLOGY AND HYDROMECHANICS, 2017, 65 (04) : 359 - 365
  • [39] I-nice: A new approach for identifying the number of clusters and initial cluster centres
    Masud, Md Abdul
    Huang, Joshua Zhexue
    Wei, Chenghao
    Wang, Jikui
    Khan, Imran
    Zhong, Ming
    INFORMATION SCIENCES, 2018, 466 : 129 - 151
  • [40] ENTROPY-BASED CLUSTER VALIDATION AND ESTIMATION OF THE NUMBER OF CLUSTERS IN GENE EXPRESSION DATA
    Novoselova, Natalia
    Tom, Igor
    JOURNAL OF BIOINFORMATICS AND COMPUTATIONAL BIOLOGY, 2012, 10 (05)