Systematic identification of conserved motif modules in the human genome

被引:24
作者
Cai, Xiaohui [2 ]
Hou, Lin [3 ,4 ,5 ]
Su, Naifang [3 ,4 ]
Hu, Haiyan [1 ]
Deng, Minghua [3 ,4 ]
Li, Xiaoman [6 ]
机构
[1] Univ Cent Florida, Sch Elect Engn & Comp Sci, Orlando, FL 32816 USA
[2] Univ Calif San Diego, Ctr Res Biol Syst, San Diego, CA 92093 USA
[3] Peking Univ, Sch Math Sci, Beijing 100871, Peoples R China
[4] Peking Univ, Ctr Theoret Biol, Beijing 100871, Peoples R China
[5] Beijing Proteome Res Ctr, Beijing Inst Radiat Med, State Key Lab Prote, Beijing 102206, Peoples R China
[6] Univ Cent Florida, Burnett Sch Biomed Sci, Orlando, FL 32816 USA
基金
中国国家自然科学基金;
关键词
CIS-REGULATORY MODULES; TRANSCRIPTION FACTORS; BINDING; DISCOVERY; ELEMENTS; GENES; EXPRESSION; RECEPTORS; PROMOTER; DATABASE;
D O I
10.1186/1471-2164-11-567
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Background: The identification of motif modules, groups of multiple motifs frequently occurring in DNA sequences, is one of the most important tasks necessary for annotating the human genome. Current approaches to identifying motif modules are often restricted to searches within promoter regions or rely on multiple genome alignments. However, the promoter regions only account for a limited number of locations where transcription factor binding sites can occur, and multiple genome alignments often cannot align binding sites with their true counterparts because of the short and degenerative nature of these transcription factor binding sites. Results: To identify motif modules systematically, we developed a computational method for the entire non-coding regions around human genes that does not rely upon the use of multiple genome alignments. First, we selected orthologous DNA blocks approximately 1-kilobase in length based on discontiguous sequence similarity. Next, we scanned the conserved segments in these blocks using known motifs in the TRANSFAC database. Finally, a frequent pattern mining technique was applied to identify motif modules within these blocks. In total, with a false discovery rate cutoff of 0.05, we predicted 3,161,839 motif modules, 90.8% of which are supported by various forms of functional evidence. Compared with experimental data from 14 ChIP-seq experiments, on average, our methods predicted 69.6% of the ChIP-seq peaks with TFBSs of multiple TFs. Our findings also show that many motif modules have distance preference and order preference among the motifs, which further supports the functionality of these predictions. Conclusions: Our work provides a large-scale prediction of motif modules in mammals, which will facilitate the understanding of gene regulation in a systematic way.
引用
收藏
页数:10
相关论文
共 46 条
[21]   COMPEL: a database on composite regulatory elements providing combinatorial transcriptional regulation [J].
Kel-Margoulis, OV ;
Romashchenko, AG ;
Kolchanov, NA ;
Wingender, E ;
Kel, AE .
NUCLEIC ACIDS RESEARCH, 2000, 28 (01) :311-315
[22]   Evaluation of regulatory potential and conservation scores for detecting cis-regulatory modules in aligned mammalian genome sequences [J].
King, DC ;
Taylor, J ;
Elnitski, L ;
Chiaromonte, F ;
Miller, W ;
Hardison, RC .
GENOME RESEARCH, 2005, 15 (08) :1051-1060
[23]   A polymorphism that affects OCT-1 binding to the TNF promoter region is associated with severe malaria [J].
Knight, JC ;
Udalova, I ;
Hill, AVS ;
Greenwood, BM ;
Peshu, N ;
Marsh, K ;
Kwiatkowski, D .
NATURE GENETICS, 1999, 22 (02) :145-150
[24]   The yin and yang of E2F-1: balancing life and death [J].
La Thangue, NB .
NATURE CELL BIOLOGY, 2003, 5 (07) :587-589
[25]   DETECTING SUBTLE SEQUENCE SIGNALS - A GIBBS SAMPLING STRATEGY FOR MULTIPLE ALIGNMENT [J].
LAWRENCE, CE ;
ALTSCHUL, SF ;
BOGUSKI, MS ;
LIU, JS ;
NEUWALD, AF ;
WOOTTON, JC .
SCIENCE, 1993, 262 (5131) :208-214
[26]   MicroRNA-330 acts as tumor suppressor and induces apoptosis of prostate cancer cells through E2F1-mediated suppression of Akt phosphorylation [J].
Lee, K-H ;
Chen, Y-L ;
Yeh, S-D ;
Hsiao, M. ;
Lin, J-T ;
Goan, Y-G ;
Lu, P-J .
ONCOGENE, 2009, 28 (38) :3360-3370
[27]   Sampling motifs on phylogenetic trees [J].
Li, XM ;
Wong, WH .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2005, 102 (27) :9481-9486
[28]   Identification of a coordinate regulator of interleukins 4, 13, and 5 by cross-species sequence comparisons [J].
Loots, GG ;
Locksley, RM ;
Blakespoor, CM ;
Wang, ZE ;
Miller, W ;
Rubin, EM ;
Frazer, KA .
SCIENCE, 2000, 288 (5463) :136-140
[29]   Analyses of deep mammalian sequence alignments and constraint predictions for 1% of the human genome [J].
Margulies, Elliott H. ;
Cooper, Gregory M. ;
Asimenos, George ;
Thomas, Daryl J. ;
Dewey, Colin N. ;
Siepel, Adam ;
Birney, Ewan ;
Keefe, Damian ;
Schwartz, Ariel S. ;
Hou, Minmei ;
Taylor, James ;
Nikolaev, Sergey ;
Montoya-Burgos, Juan I. ;
Loytynoja, Ari ;
Whelan, Simon ;
Pardi, Fabio ;
Massingham, Tim ;
Brown, James B. ;
Bickel, Peter ;
Holmes, Ian ;
Mullikin, James C. ;
Ureta-Vidal, Abel ;
Paten, Benedict ;
Stone, Eric A. ;
Rosenbloom, Kate R. ;
Kent, W. James ;
Antonarakis, Stylianos E. ;
Batzoglou, Serafim ;
Goldman, Nick ;
Hardison, Ross ;
Haussler, David ;
Miller, Webb ;
Pachter, Lior ;
Green, Eric D. ;
Sidow, Arend .
GENOME RESEARCH, 2007, 17 (06) :760-774
[30]   Genome-wide discovery of human heart enhancers [J].
Narlikar, Leelavati ;
Sakabe, Noboru J. ;
Blanski, Alexander A. ;
Arimura, Fabio E. ;
Westlund, John M. ;
Nobrega, Marcelo A. ;
Ovcharenko, Ivan .
GENOME RESEARCH, 2010, 20 (03) :381-392