NIFTI: An evolutionary approach for finding number of clusters in microarray data

被引:4
|
作者
Jonnalagadda, Sudhakar [1 ]
Srinivasan, Rajagopalan [1 ]
机构
[1] Natl Univ Singapore, Dept Chem & Biomol Engn, Singapore 119260, Singapore
来源
BMC BIOINFORMATICS | 2009年 / 10卷
关键词
GENE-EXPRESSION DATA; VALIDATION TECHNIQUES; DATA SET; PATTERNS; VALIDITY; MODEL;
D O I
10.1186/1471-2105-10-40
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Clustering techniques are routinely used in gene expression data analysis to organize the massive data. Clustering techniques arrange a large number of genes or assays into a few clusters while maximizing the intra-cluster similarity and inter-cluster separation. While clustering of genes facilitates learning the functions of un-characterized genes using their association with known genes, clustering of assays reveals the disease stages and subtypes. Many clustering algorithms require the user to specify the number of clusters a priori. A wrong specification of number of clusters generally leads to either failure to detect novel clusters (disease subtypes) or unnecessary splitting of natural clusters. Results: We have developed a novel method to find the number of clusters in gene expression data. Our procedure evaluates different partitions (each with different number of clusters) from the clustering algorithm and finds the partition that best describes the data. In contrast to the existing methods that evaluate the partitions independently, our procedure considers the dynamic rearrangement of cluster members when a new cluster is added. Partition quality is measured based on a new index called Net InFormation Transfer Index (NIFTI) that measures the information change when an additional cluster is introduced. Information content of a partition increases when clusters do not intersect and decreases if they are not clearly separated. A partition with the highest Total Information Content (TIC) is selected as the optimal one. We illustrate our method using four publicly available microarray datasets. Conclusion: In all four case studies, the proposed method correctly identified the number of clusters and performs better than other well known methods. Our method also showed invariance to the clustering techniques.
引用
收藏
页数:13
相关论文
共 50 条
  • [21] Deep learning approach for microarray cancer data classification
    Basavegowda, Hema Shekar
    Dagnew, Guesh
    CAAI TRANSACTIONS ON INTELLIGENCE TECHNOLOGY, 2020, 5 (01) : 22 - 33
  • [22] An integrated hierarchical Bayesian approach to normalizing left-censored microRNA microarray data
    Kang, Jia
    Xu, Ethan Yixun
    BMC GENOMICS, 2013, 14
  • [23] Determining the number of clusters using information entropy for mixed data
    Liang, Jiye
    Zhao, Xingwang
    Li, Deyu
    Cao, Fuyuan
    Dang, Chuangyin
    PATTERN RECOGNITION, 2012, 45 (06) : 2251 - 2265
  • [24] Subspace Clustering of Categorical and Numerical Data With an Unknown Number of Clusters
    Jia, Hong
    Cheung, Yiu-Ming
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2018, 29 (08) : 3308 - 3325
  • [25] Estimation of the Number of Clusters in Multipath Radio Channel Data Sets
    Mota, Susana
    Perez-Fontan, Fernando
    Rocha, Armando
    IEEE TRANSACTIONS ON ANTENNAS AND PROPAGATION, 2013, 61 (05) : 2879 - 2883
  • [26] An evolutionary data mining approach on hydrological data with classifier juries
    Segretier, Wilfried
    Clergue, Manuel
    Collard, Martine
    Izquierdo, Luis
    2012 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION (CEC), 2012,
  • [27] Mining Subspace Clusters from DNA Microarray Data Using Large Itemset Techniques
    Chang, Ye-In
    Chen, Jiun-Rung
    Tsai, Yueh-Chi
    JOURNAL OF COMPUTATIONAL BIOLOGY, 2009, 16 (05) : 745 - 768
  • [28] A Multi-Objective Approach to Discover Biclusters in Microarray Data
    Divina, Federico
    Aguilar-Ruiz, Jesus S.
    GECCO 2007: GENETIC AND EVOLUTIONARY COMPUTATION CONFERENCE, VOL 1 AND 2, 2007, : 385 - 392
  • [29] A Meta-learning approach for recommending the number of clusters for clustering algorithms
    Pimentel, Bruno Almeida
    de Carvalho, Andre C. P. L. F.
    KNOWLEDGE-BASED SYSTEMS, 2020, 195
  • [30] Finding Multiple Coherent Biclusters in Microarray Data Using Variable String Length Multiobjective Genetic Algorithm
    Maulik, Ujjwal
    Mukhopadhyay, Anirban
    Bandyopadhyay, Sanghamitra
    IEEE TRANSACTIONS ON INFORMATION TECHNOLOGY IN BIOMEDICINE, 2009, 13 (06): : 969 - 975