Simultaneously Learning DNA Motif Along with Its Position and Sequence Rank Preferences Through Expectation Maximization Algorithm

被引:11
作者
Zhang, Zhizhuo [1 ]
Chang, Cheng Wei [2 ]
Hugo, Willy [1 ]
Cheung, Edwin [2 ]
Sung, Wing-Kin [1 ,2 ]
机构
[1] Natl Univ Singapore, Singapore 117417, Singapore
[2] Genome Inst Singapore, Singapore, Singapore
关键词
binding preference; expectation maximization; importance sampling; motif finding; TRANSCRIPTION FACTOR; BINDING-SITES; ANDROGEN RECEPTOR; DISCOVERY; CHIP; CHROMATIN; SEARCH;
D O I
10.1089/cmb.2012.0233
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Although de novo motifs can be discovered through mining over-represented sequence patterns, this approach misses some real motifs and generates many false positives. To improve accuracy, one solution is to consider some additional binding features (i.e., position preference and sequence rank preference). This information is usually required from the user. This article presents a de novo motif discovery algorithm called SEME (sampling with expectation maximization for motif elicitation), which uses pure probabilistic mixture model to model the motif's binding features and uses expectation maximization (EM) algorithms to simultaneously learn the sequence motif, position, and sequence rank preferences without asking for any prior knowledge from the user. SEME is both efficient and accurate thanks to two important techniques: the variable motif length extension and importance sampling. Using 75 large-scale synthetic datasets, 32 metazoan compendium benchmark datasets, and 164 chromatin immunoprecipitation sequencing (ChIP-Seq) libraries, we demonstrated the superior performance of SEME over existing programs in finding transcription factor (TF) binding sites. SEME is further applied to a more difficult problem of finding the coregulated TF (coTF) motifs in 15 ChIP-Seq libraries. It identified significantly more correct coTF motifs and, at the same time, predicted coTF motifs with better matching to the known motifs. Finally, we show that the learned position and sequence rank preferences of each coTF reveals potential interaction mechanisms between the primary TF and the coTF within these sites. Some of these findings were further validated by the ChIP-Seq experiments of the coTFs. The application is available online.
引用
收藏
页码:237 / 248
页数:12
相关论文
共 33 条
[1]   Gene Ontology: tool for the unification of biology [J].
Ashburner, M ;
Ball, CA ;
Blake, JA ;
Botstein, D ;
Butler, H ;
Cherry, JM ;
Davis, AP ;
Dolinski, K ;
Dwight, SS ;
Eppig, JT ;
Harris, MA ;
Hill, DP ;
Issel-Tarver, L ;
Kasarskis, A ;
Lewis, S ;
Matese, JC ;
Richardson, JE ;
Ringwald, M ;
Rubin, GM ;
Sherlock, G .
NATURE GENETICS, 2000, 25 (01) :25-29
[2]  
Bailey T L, 1994, Proc Int Conf Intell Syst Mol Biol, V2, P28
[3]   DREME: motif discovery in transcription factor ChIP-seq data [J].
Bailey, Timothy L. .
BIOINFORMATICS, 2011, 27 (12) :1653-1659
[4]  
Berger Michael F., 2006, V338, P245
[5]   Integration of external signaling pathways with the core transcriptional network in embryonic stem cells [J].
Chen, Xi ;
Xu, Han ;
Yuan, Ping ;
Fang, Fang ;
Huss, Mikael ;
Vega, Vinsensius B. ;
Wong, Eleanor ;
Orlov, Yuriy L. ;
Zhang, Weiwei ;
Jiang, Jianming ;
Loh, Yuin-Han ;
Yeo, Hock Chuan ;
Yeo, Zhen Xuan ;
Narang, Vipin ;
Govindarajan, Kunde Ramamoorthy ;
Leong, Bernard ;
Shahab, Atif ;
Ruan, Yijun ;
Bourque, Guillaume ;
Sung, Wing-Kin ;
Clarke, Neil D. ;
Wei, Chia-Lin ;
Ng, Huck-Hui .
CELL, 2008, 133 (06) :1106-1117
[6]   RankMotif++: a motif-search algorithm that accounts for relative ranks of K-mers in binding transcription factors [J].
Chen, Xiaoyu ;
Hughes, Timothy R. ;
Morris, Quaid .
BIOINFORMATICS, 2007, 23 (13) :I72-I79
[7]   Trawler:: de novo regulatory motif discovery pipeline for chromatin immunoprecipitation [J].
Ettwiller, Laurence ;
Paten, Benedict ;
Ramialison, Mirana ;
Birney, Ewan ;
Wittbrodt, Joachim .
NATURE METHODS, 2007, 4 (07) :563-565
[8]   Mapping of transcription factor binding regions in mammalian cells by ChIP: Comparison of array- and sequencing-based technologies [J].
Euskirchen, Ghia M. ;
Rozowsky, Joel S. ;
Wei, Chia-Lin ;
Lee, Wah Heng ;
Zhang, Zhengdong D. ;
Hartman, Stephen ;
Emanuelsson, Olof ;
Stolc, Viktor ;
Weissman, Sherman ;
Gerstein, Mark B. ;
Ruan, Yijun ;
Snyder, Michael .
GENOME RESEARCH, 2007, 17 (06) :898-909
[9]   Finding functional sequence elements by multiple local alignment [J].
Frith, MC ;
Hansen, U ;
Spouge, JL ;
Weng, ZP .
NUCLEIC ACIDS RESEARCH, 2004, 32 (01) :189-200
[10]   The role of hepatocyte nuclear factor-3α (forkhead box A1) and androgen receptor in transcriptional regulation of prostatic genes [J].
Gao, N ;
Zhang, JF ;
Rao, MA ;
Case, TC ;
Mirosevich, J ;
Wang, YQ ;
Jin, RJ ;
Gupta, A ;
Rennie, PS ;
Matusik, RJ .
MOLECULAR ENDOCRINOLOGY, 2003, 17 (08) :1484-1507