NIFTI: An evolutionary approach for finding number of clusters in microarray data

被引:4
|
作者
Jonnalagadda, Sudhakar [1 ]
Srinivasan, Rajagopalan [1 ]
机构
[1] Natl Univ Singapore, Dept Chem & Biomol Engn, Singapore 119260, Singapore
来源
BMC BIOINFORMATICS | 2009年 / 10卷
关键词
GENE-EXPRESSION DATA; VALIDATION TECHNIQUES; DATA SET; PATTERNS; VALIDITY; MODEL;
D O I
10.1186/1471-2105-10-40
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Clustering techniques are routinely used in gene expression data analysis to organize the massive data. Clustering techniques arrange a large number of genes or assays into a few clusters while maximizing the intra-cluster similarity and inter-cluster separation. While clustering of genes facilitates learning the functions of un-characterized genes using their association with known genes, clustering of assays reveals the disease stages and subtypes. Many clustering algorithms require the user to specify the number of clusters a priori. A wrong specification of number of clusters generally leads to either failure to detect novel clusters (disease subtypes) or unnecessary splitting of natural clusters. Results: We have developed a novel method to find the number of clusters in gene expression data. Our procedure evaluates different partitions (each with different number of clusters) from the clustering algorithm and finds the partition that best describes the data. In contrast to the existing methods that evaluate the partitions independently, our procedure considers the dynamic rearrangement of cluster members when a new cluster is added. Partition quality is measured based on a new index called Net InFormation Transfer Index (NIFTI) that measures the information change when an additional cluster is introduced. Information content of a partition increases when clusters do not intersect and decreases if they are not clearly separated. A partition with the highest Total Information Content (TIC) is selected as the optimal one. We illustrate our method using four publicly available microarray datasets. Conclusion: In all four case studies, the proposed method correctly identified the number of clusters and performs better than other well known methods. Our method also showed invariance to the clustering techniques.
引用
收藏
页数:13
相关论文
共 50 条
  • [41] StructHDP: automatic inference of number of clusters and population structure from admixed genotype data
    Shringarpure, Suyash
    Won, Daegun
    Xing, Eric P.
    BIOINFORMATICS, 2011, 27 (13) : I324 - I332
  • [42] An evolutionary framework based microarray gene selection and classification approach using binary shuffled frog leaping algorithm
    Dash, Rasmita
    Dash, Rajashree
    Rautray, Rasmita
    JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES, 2022, 34 (03) : 880 - 891
  • [43] An Evolutionary Computation Approach for Optimizing Multilevel Data to Predict Patient Outcomes
    Barnes, Sean
    Saria, Suchi
    Levin, Scott
    JOURNAL OF HEALTHCARE ENGINEERING, 2018, 2018
  • [44] A Novel Information Theoretic Approach to Gene Selection for Cancer Classification Using Microarray Data
    Naseem, Imran
    Togneri, Roberto
    Bennamoun, Mohammed
    CURRENT BIOINFORMATICS, 2015, 10 (04) : 431 - 440
  • [45] An incremental feature selection approach based on scatter matrices for classification of cancer microarray data
    Sardana, Manju
    Agrawal, R. K.
    Kaur, Baljeet
    INTERNATIONAL JOURNAL OF COMPUTER MATHEMATICS, 2015, 92 (02) : 277 - 295
  • [46] Clustering-based hybrid feature selection approach for high dimensional microarray data
    Babu, Samson Anosh P.
    Annavarapu, Chandra Sekhara Rao
    Dara, Suresh
    CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2021, 213
  • [47] An adequacy approach for deciding the number of clusters for OTRIMLE robust Gaussian mixture-based clustering
    Hennig, Christian
    Coretto, Pietro
    AUSTRALIAN & NEW ZEALAND JOURNAL OF STATISTICS, 2022, 64 (02) : 230 - 254
  • [48] On finding battery age through ground truth based data driven approach
    Bhatti, Aamer Iqbal
    Farhan, Muhammad
    Zafar, Usman
    Ahmed, Qadeer
    2019 12TH ASIAN CONTROL CONFERENCE (ASCC), 2019, : 1090 - 1094
  • [49] Metaheuristic approach for an enhanced mRMR filter method for classification using drug response microarray data
    Mohamed, Nur Shazila
    Zainudin, Suhaila
    Othman, Zulaiha Ali
    EXPERT SYSTEMS WITH APPLICATIONS, 2017, 90 : 224 - 231
  • [50] A Multi-Classifier Approach on L1-Regulated Features of Microarray Cancer Data
    Shekar, B. H.
    Dagnew, Guesh
    2018 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI), 2018, : 1515 - 1522