Analysing microarray expression data through effective clustering

被引:19
|
作者
Masciari, E. [1 ]
Mazzeo, G. M. [1 ]
Zaniolo, C. [2 ]
机构
[1] ICAR CNR, I-87036 Arcavacata Di Rende, Italy
[2] Univ Calif Los Angeles, Los Angeles, CA USA
基金
美国国家科学基金会;
关键词
Bioinformatics; Clustering; Biological data analysis; GENE-EXPRESSION; ALGORITHM; CLASSIFICATION; SELECTION;
D O I
10.1016/j.ins.2013.12.003
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The recent advances in genomic technologies and the availability of large-scale microarray datasets call for the development of advanced data analysis techniques, such as data mining and statistical analysis to cite a few. Among the mining techniques proposed so far, cluster analysis has become a standard method for the analysis of microarray expression data. It can be used both for initial screening of patients and for extraction of disease molecular signatures. Moreover, clustering can be profitably exploited to characterize genes of unknown function and uncover patterns that can be interpreted as indications of the status of cellular processes. Finally, clustering biological data would be useful not only for exploring the data but also for discovering implicit links between the objects. To this end, several clustering approaches have been proposed in order to obtain a good trade-off between accuracy and efficiency of the clustering process. In particular, great attention has been devoted to hierarchical clustering algorithms for their accuracy in unsupervised identification and stratification of groups of similar genes or patients, while, partition based approaches are exploited when fast computations are required. Indeed, it is well known that no existing clustering algorithm completely satisfies both accuracy and efficiency requirements, thus a good clustering algorithm has to be evaluated with respect to some external criteria that are independent from the metric being used to compute clusters. In this paper, we propose a clustering algorithm called M-CLUBS (for Microarray data CLustering Using Binary Splitting) exhibiting higher accuracy than the hierarchical ones proposed so far while allowing a faster computation with respect to partition based approaches. Indeed, M-CLUBS is faster and more accurate than other algorithms, including k-means and its recently proposed refinements, as we will show in the experimental section. The algorithm consists of a divisive phase and an agglomerative phase; during these two phases, the samples are repartitioned using a least quadratic distance criterion possessing unique analytical properties that we exploit to achieve a very fast computation. M-CLUBS derives good clusters without requiring input from users, and it is robust and impervious to noise, while providing better speed and accuracy than methods, such as BIRCH, that are endowed with the same critical properties. Due to the structural feature of microarray data (they are represented as arrays of numeric values), M-CLUBS is suitable for analyzing them since it is designed to perform well for Euclidean distances. In order to stronger the obtained results we Interpreted the obtained clusters by a domain expert and the evaluation by quality measures specifically tailored for biological validity assessment. (C) 2013 Elsevier Inc. All rights reserved.
引用
收藏
页码:32 / 45
页数:14
相关论文
共 50 条
  • [1] Proximity Measures for Clustering Gene Expression Microarray Data: A Validation Methodology and a Comparative Analysis
    Jaskowiak, Pablo A.
    Campello, Ricardo J. G. B.
    Costa, Ivan G.
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2013, 10 (04) : 845 - 857
  • [2] Clustering of Association Rules on Microarray Gene Expression Data
    Alagukumar, S.
    Vanitha, C. Devi Arockia
    Lawrance, R.
    ADVANCED COMPUTING AND INTELLIGENT ENGINEERING, 2020, 1082 : 85 - 97
  • [3] An evolutionary clustering algorithm for gene expression microarray data analysis
    Ma, Patrick C. H.
    Chan, Keith C. C.
    Yao, Xin
    Chiu, David K. Y.
    IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, 2006, 10 (03) : 296 - 314
  • [4] Effective Clustering of Microarray Gene Expression Data using Signal Processing and Soft Computing Methods
    Mishra, Purnendu
    Bhoi, Nilamani
    Meher, Jayakishan
    2015 INTERNATIONAL CONFERENCE ON ELECTRICAL, ELECTRONICS, SIGNALS, COMMUNICATION AND OPTIMIZATION (EESCO), 2015,
  • [5] Model-based clustering of microarray expression data via latent Gaussian mixture models
    McNicholas, Paul D.
    Murphy, Thomas Brendan
    BIOINFORMATICS, 2010, 26 (21) : 2705 - 2712
  • [6] Clustering microarray gene expression data using enhanced harmony search
    Pandi, M.
    Premalatha, K.
    INTERNATIONAL JOURNAL OF BIO-INSPIRED COMPUTATION, 2015, 7 (05) : 296 - 306
  • [7] INTEGRATIVE MODEL-BASED CLUSTERING OF MICROARRAY METHYLATION AND EXPRESSION DATA
    Kormaksson, Matthias
    Booth, James G.
    Figueroa, Maria E.
    Melnick, Ari
    ANNALS OF APPLIED STATISTICS, 2012, 6 (03) : 1327 - 1347
  • [8] Clustering of high throughput gene expression data
    Pirim, Harun
    Eksioglu, Burak
    Perkins, Andy D.
    Yuceer, Cetin
    COMPUTERS & OPERATIONS RESEARCH, 2012, 39 (12) : 3046 - 3061
  • [9] Mining microarray gene expression data with unsupervised possibilistic clustering and proximity graphs
    Romdhane, L. B.
    Shili, H.
    Ayeb, B.
    APPLIED INTELLIGENCE, 2010, 33 (02) : 220 - 231
  • [10] Fuzzy Types Clustering for Microarray Data
    Kim, Seo Young
    Choi, Tai Myong
    PROCEEDINGS OF WORLD ACADEMY OF SCIENCE, ENGINEERING AND TECHNOLOGY, VOL 4, 2005, 4 : 12 - 15