A biclustering algorithm based on a Bicluster Enumeration Tree: application to DNA microarray data

被引:53
作者
Ayadi, Wassim [1 ,2 ]
Elloumi, Mourad [1 ]
Hao, Jin-Kao [2 ]
机构
[1] Higher Sch Sci & Technol Tunis, UTIC, Tunis 1008, Tunisia
[2] Univ Angers, LERIA, F-49045 Angers, France
关键词
GENE-EXPRESSION DATA; TIME-COURSE; PATTERNS; CLUSTERS; MODELS;
D O I
10.1186/1756-0381-2-9
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Background: In a number of domains, like in DNA microarray data analysis, we need to cluster simultaneously rows (genes) and columns (conditions) of a data matrix to identify groups of rows coherent with groups of columns. This kind of clustering is called biclustering. Biclustering algorithms are extensively used in DNA microarray data analysis. More effective biclustering algorithms are highly desirable and needed. Methods: We introduce BiMine, a new enumeration algorithm for biclustering of DNA microarray data. The proposed algorithm is based on three original features. First, BiMine relies on a new evaluation function called Average Spearman's rho (ASR). Second, BiMine uses a new tree structure, called Bicluster Enumeration Tree (BET), to represent the different biclusters discovered during the enumeration process. Third, to avoid the combinatorial explosion of the search tree, BiMine introduces a parametric rule that allows the enumeration process to cut tree branches that cannot lead to good biclusters. Results: The performance of the proposed algorithm is assessed using both synthetic and real DNA microarray data. The experimental results show that BiMine competes well with several other biclustering methods. Moreover, we test the biological significance using a gene annotation web-tool to show that our proposed method is able to produce biologically relevant biclusters. The software is available upon request from the authors to academic users.
引用
收藏
页数:16
相关论文
共 45 条
[1]  
Agrawal R., 1998, SIGMOD Record, V27, P94, DOI 10.1145/276305.276314
[2]   Shifting and scaling patterns from gene expression data [J].
Aguilar-Ruiz, JS .
BIOINFORMATICS, 2005, 21 (20) :3840-3845
[3]  
Gallo CA, 2009, LECT NOTES COMPUT SC, V5483, P44, DOI 10.1007/978-3-642-01184-9_5
[4]   Random walk biclustering for microarray data [J].
Angiulli, Fabrizio ;
Cesario, Eugenio ;
Pizzuti, Clara .
INFORMATION SCIENCES, 2008, 178 (06) :1479-1497
[5]  
[Anonymous], 2002, Proceedings of the 2002 ACM SIGMOD international conference on Management of data, DOI DOI 10.1145/564691.564737
[6]  
[Anonymous], 1978, PUBLICATIONS AM STAT
[7]   Clustering of gene expression data using a local shape-based similarity measure [J].
Balasubramaniyan, R ;
Hüllermeier, E ;
Weskamp, N ;
Kämper, J .
BIOINFORMATICS, 2005, 21 (07) :1069-1077
[8]   BicAT: a biclustering analysis toolbox [J].
Barkow, S ;
Bleuler, S ;
Prelic, A ;
Zimmermann, P ;
Zitzler, E .
BIOINFORMATICS, 2006, 22 (10) :1282-1283
[9]   Discovering local structure in gene expression data: The order-preserving submatrix problem [J].
Ben-Dor, A ;
Chor, B ;
Karp, R ;
Yakhini, Z .
JOURNAL OF COMPUTATIONAL BIOLOGY, 2003, 10 (3-4) :373-384
[10]   Characterizing gene sets with FuncAssociate [J].
Berriz, GF ;
King, OD ;
Bryant, B ;
Sander, C ;
Roth, FP .
BIOINFORMATICS, 2003, 19 (18) :2502-2504