Analysing microarray expression data through effective clustering

被引:19
作者
Masciari, E. [1 ]
Mazzeo, G. M. [1 ]
Zaniolo, C. [2 ]
机构
[1] ICAR CNR, I-87036 Arcavacata Di Rende, Italy
[2] Univ Calif Los Angeles, Los Angeles, CA USA
基金
美国国家科学基金会;
关键词
Bioinformatics; Clustering; Biological data analysis; GENE-EXPRESSION; ALGORITHM; CLASSIFICATION; SELECTION;
D O I
10.1016/j.ins.2013.12.003
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The recent advances in genomic technologies and the availability of large-scale microarray datasets call for the development of advanced data analysis techniques, such as data mining and statistical analysis to cite a few. Among the mining techniques proposed so far, cluster analysis has become a standard method for the analysis of microarray expression data. It can be used both for initial screening of patients and for extraction of disease molecular signatures. Moreover, clustering can be profitably exploited to characterize genes of unknown function and uncover patterns that can be interpreted as indications of the status of cellular processes. Finally, clustering biological data would be useful not only for exploring the data but also for discovering implicit links between the objects. To this end, several clustering approaches have been proposed in order to obtain a good trade-off between accuracy and efficiency of the clustering process. In particular, great attention has been devoted to hierarchical clustering algorithms for their accuracy in unsupervised identification and stratification of groups of similar genes or patients, while, partition based approaches are exploited when fast computations are required. Indeed, it is well known that no existing clustering algorithm completely satisfies both accuracy and efficiency requirements, thus a good clustering algorithm has to be evaluated with respect to some external criteria that are independent from the metric being used to compute clusters. In this paper, we propose a clustering algorithm called M-CLUBS (for Microarray data CLustering Using Binary Splitting) exhibiting higher accuracy than the hierarchical ones proposed so far while allowing a faster computation with respect to partition based approaches. Indeed, M-CLUBS is faster and more accurate than other algorithms, including k-means and its recently proposed refinements, as we will show in the experimental section. The algorithm consists of a divisive phase and an agglomerative phase; during these two phases, the samples are repartitioned using a least quadratic distance criterion possessing unique analytical properties that we exploit to achieve a very fast computation. M-CLUBS derives good clusters without requiring input from users, and it is robust and impervious to noise, while providing better speed and accuracy than methods, such as BIRCH, that are endowed with the same critical properties. Due to the structural feature of microarray data (they are represented as arrays of numeric values), M-CLUBS is suitable for analyzing them since it is designed to perform well for Euclidean distances. In order to stronger the obtained results we Interpreted the obtained clusters by a domain expert and the evaluation by quality measures specifically tailored for biological validity assessment. (C) 2013 Elsevier Inc. All rights reserved.
引用
收藏
页码:32 / 45
页数:14
相关论文
共 50 条
[21]   Empirical Evidence of the Applicability of Functional Clustering through Gene Expression Classification [J].
Krejnik, Milos ;
Klema, Jiri .
IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2012, 9 (03) :788-798
[22]   An Effective Method Determining the Initial Cluster Centers for K-means for Clustering Gene Expression Data [J].
Tanir, Deniz ;
Nuriyeva, Fidan .
2017 INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND ENGINEERING (UBMK), 2017, :751-754
[23]   A Bayesian method for analysing spotted microarray data [J].
Meiklejohn, CD ;
Townsend, JP .
BRIEFINGS IN BIOINFORMATICS, 2005, 6 (04) :318-330
[24]   Fuzzy-Rough Supervised Attribute Clustering Algorithm and Classification of Microarray Data [J].
Maji, Pradipta .
IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS, 2011, 41 (01) :222-233
[25]   A Granular Self-Organizing Map for Clustering and Gene Selection in Microarray Data [J].
Ray, Shubhra Sankar ;
Ganivada, Avatharam ;
Pal, Sankar K. .
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2016, 27 (09) :1890-1906
[26]   Non-Negative Factorization for Clustering of Microarray Data [J].
Morgos, L. .
INTERNATIONAL JOURNAL OF COMPUTERS COMMUNICATIONS & CONTROL, 2014, 9 (01) :16-23
[27]   Automatic Generation of Merge Factor for Clustering Microarray Data [J].
Pavan, K. Karteeka ;
Rao, Allam Appa ;
Rao, A. V. Dattatreya ;
Sridhar, G. R. .
INTERNATIONAL JOURNAL OF COMPUTER SCIENCE AND NETWORK SECURITY, 2008, 8 (09) :127-131
[28]   Finding best algorithmic components for clustering microarray data [J].
Vukicevic, Milan ;
Kirchner, Kathrin ;
Delibasic, Boris ;
Jovanovic, Milos ;
Ruhland, Johannes ;
Suknovic, Milija .
KNOWLEDGE AND INFORMATION SYSTEMS, 2013, 35 (01) :111-130
[29]   Finding best algorithmic components for clustering microarray data [J].
Milan Vukićević ;
Kathrin Kirchner ;
Boris Delibašić ;
Miloš Jovanović ;
Johannes Ruhland ;
Milija Suknović .
Knowledge and Information Systems, 2013, 35 :111-130
[30]   Effective cancer subtyping by employing density peaks clustering by using gene expression microarray [J].
Rashid Mehmood ;
Saeed El-Ashram ;
Rongfang Bie ;
Yunchuan Sun .
Personal and Ubiquitous Computing, 2018, 22 :615-619