Unsupervised Classification for Tiling Arrays: ChIP-chip and Transcriptome

被引:6
作者
Berard, Caroline [1 ]
Martin-Magniette, Marie-Laure [1 ,2 ]
Brunaud, Veronique [2 ]
Aubourg, Sebastien [2 ]
Robin, Stephane [1 ]
机构
[1] UMR AgroParisTech, INRA MIA 518, Paris, France
[2] URGV UMR INRA, CNRS, UEVE, Paris, France
关键词
bivariate Gaussian mixture; hidden Markov model; tiling arrays; unsupervised classification; HIDDEN MARKOV-MODELS; FUNCTIONAL-ANALYSIS; MIXTURE MODEL; CGH DATA; GENOME; NUMBER; EXPRESSION; DATABASE;
D O I
10.2202/1544-6115.1692
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Tiling arrays make possible a large-scale exploration of the genome thanks to probes which cover the whole genome with very high density, up to 2,000,000 probes. Biological questions usually addressed are either the expression difference between two conditions or the detection of transcribed regions. In this work, we propose to consider both questions simultaneously as an unsupervised classification problem by modeling the joint distribution of the two conditions. In contrast to previous methods, we account for all available information on the probes as well as biological knowledge such as annotation and spatial dependence between probes. Since probes are not biologically relevant units, we propose a classification rule for non-connected regions covered by several probes. Applications to transcriptomic and ChIP-chip data of Arabidopsis thaliana obtained with a NimbleGen tiling array highlight the importance of a precise modeling and of the region classification. The "TAHMMAnnot" package is implemented in R and C and is freely available from CRAN.
引用
收藏
页数:23
相关论文
共 43 条
[1]  
Banaei A., 2009, BIOSIGNALS
[2]   MODEL-BASED GAUSSIAN AND NON-GAUSSIAN CLUSTERING [J].
BANFIELD, JD ;
RAFTERY, AE .
BIOMETRICS, 1993, 49 (03) :803-821
[3]  
Baum L. E., 1972, Inequalities, V3, P1
[4]   Genome-Wide Association of Histone H3 Lysine Nine Methylation with CHG DNA Methylation in Arabidopsis thaliana [J].
Bernatavichute, Yana V. ;
Zhang, Xiaoyu ;
Cokus, Shawn ;
Pellegrini, Matteo ;
Jacobsen, Steven E. .
PLOS ONE, 2008, 3 (09)
[5]   Assessing a mixture model for clustering with the integrated completed likelihood [J].
Biernacki, C ;
Celeux, G ;
Govaert, G .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2000, 22 (07) :719-725
[6]   ChIPOTle: a user-friendly tool for the analysis of ChIP-chip data [J].
Buck, MJ ;
Nobel, AB ;
Lieb, JD .
GENOME BIOLOGY, 2005, 6 (11)
[7]   ChIP-chip: considerations for the design, analysis, and application of genome-wide chromatin immunoprecipitation experiments [J].
Buck, MJ ;
Lieb, JD .
GENOMICS, 2004, 83 (03) :349-360
[8]   Unbiased mapping of transcription factor binding sites along human chromosomes 21 and 22 points to widespread regulation of noncoding RNAs [J].
Cawley, S ;
Bekiranov, S ;
Ng, HH ;
Kapranov, P ;
Sekinger, EA ;
Kampa, D ;
Piccolboni, A ;
Sementchenko, V ;
Cheng, J ;
Williams, AJ ;
Wheeler, R ;
Wong, B ;
Drenkow, J ;
Yamanaka, M ;
Patel, S ;
Brubaker, S ;
Tammana, H ;
Helt, G ;
Struhl, K ;
Gingeras, TR .
CELL, 2004, 116 (04) :499-509
[9]   GAUSSIAN PARSIMONIOUS CLUSTERING MODELS [J].
CELEUX, G ;
GOVAERT, G .
PATTERN RECOGNITION, 1995, 28 (05) :781-793
[10]  
Celeux G., 2007, REV MODULAD, V35, P25