Evaluation of methods for modeling transcription factor sequence specificity

被引:276
作者
Weirauch, Matthew T. [1 ,2 ,3 ,4 ,5 ]
Cote, Atina [1 ,2 ]
Norel, Raquel [6 ]
Annala, Matti [7 ]
Zhao, Yue [8 ]
Riley, Todd R. [9 ,10 ]
Saez-Rodriguez, Julio [11 ]
Cokelaer, Thomas [11 ]
Vedenko, Anastasia [12 ,13 ]
Talukder, Shaheynoor [1 ,2 ]
Bussemaker, Harmen J. [9 ,10 ]
Morris, Quaid D. [1 ,2 ,14 ]
Bulyk, Martha L. [12 ,13 ,15 ,16 ]
Stolovitzky, Gustavo [6 ]
Hughes, Timothy R. [1 ,2 ,14 ]
机构
[1] Univ Toronto, Banting & Best Dept Med Res, Toronto, ON, Canada
[2] Univ Toronto, Donnelly Ctr, Toronto, ON, Canada
[3] Cincinnati Childrens Hosp Med Ctr, CAGE, Cincinnati, OH USA
[4] Cincinnati Childrens Hosp Med Ctr, Div Rheumatol, Cincinnati, OH USA
[5] Cincinnati Childrens Hosp Med Ctr, Div Biomed Informat, Cincinnati, OH USA
[6] IBM Computat Biol Ctr, New York, NY USA
[7] Tampere Univ Technol, Dept Signal Proc, FIN-33101 Tampere, Finland
[8] Univ Penn, Dept Genet, Philadelphia, PA 19104 USA
[9] Columbia Univ, Dept Biol Sci, New York, NY 10027 USA
[10] Columbia Univ, Ctr Computat Biol & Bioinformat, Med Ctr, New York, NY 10027 USA
[11] EMBL EBI European Bioinformat Inst, Cambridge, England
[12] Brigham & Womens Hosp, Dept Med, Div Genet, Boston, MA 02115 USA
[13] Harvard Univ, Sch Med, Boston, MA USA
[14] Univ Toronto, Dept Mol Genet, Toronto, ON, Canada
[15] Brigham & Womens Hosp, Dept Pathol, Boston, MA 02115 USA
[16] Harvard Univ, Sch Med, Harvard Mit Div Hlth Sci & Technol, Boston, MA USA
基金
芬兰科学院; 美国国家科学基金会; 以色列科学基金会; 美国国家卫生研究院; 加拿大健康研究院;
关键词
DNA-BINDING SPECIFICITY; REGULATORY SEQUENCE; SITES; GENOME; RESOLUTION; PROTEIN; IDENTIFICATION; RECOGNITION; LANDSCAPES; EXPRESSION;
D O I
10.1038/nbt.2486
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Genomic analyses often involve scanning for potential transcription factor (TF) binding sites using models of the sequence specificity of DNA binding proteins. Many approaches have been developed to model and learn a protein's DNA-binding specificity, but these methods have not been systematically compared. Here we applied 26 such approaches to in vitro protein binding microarray data for 66 mouse TFs belonging to various families. For nine TFs, we also scored the resulting motif models on in vivo data, and found that the best in vitro-derived motifs performed similarly to motifs derived from the in vivo data. Our results indicate that simple models based on mononucleotide position weight matrices trained by the best methods perform similarly to more complex models for most TFs examined, but fall short in specific cases (<10% of the TFs examined here). In addition, the best-performing motifs typically have relatively low information content, consistent with widespread degeneracy in eukaryotic TF sequence preferences.
引用
收藏
页码:126 / 134
页数:9
相关论文
共 48 条
[1]   High Resolution Models of Transcription Factor-DNA Affinities Improve In Vitro and In Vivo Binding Predictions [J].
Agius, Phaedra ;
Arvey, Aaron ;
Chang, William ;
Noble, William Stafford ;
Leslie, Christina .
PLOS COMPUTATIONAL BIOLOGY, 2010, 6 (09)
[2]   A Linear Model for Transcription Factor Binding Affinity Prediction in Protein Binding Microarrays [J].
Annala, Matti ;
Laurila, Kirsti ;
Lahdesmaki, Harri ;
Nykter, Matti .
PLOS ONE, 2011, 6 (05)
[3]   Diversity and Complexity in DNA Recognition by Transcription Factors [J].
Badis, Gwenael ;
Berger, Michael F. ;
Philippakis, Anthony A. ;
Talukder, Shaheynoor ;
Gehrke, Andrew R. ;
Jaeger, Savina A. ;
Chan, Esther T. ;
Metzler, Genita ;
Vedenko, Anastasia ;
Chen, Xiaoyu ;
Kuznetsov, Hanna ;
Wang, Chi-Fong ;
Coburn, David ;
Newburger, Daniel E. ;
Morris, Quaid ;
Hughes, Timothy R. ;
Bulyk, Martha L. .
SCIENCE, 2009, 324 (5935) :1720-1723
[4]  
Bailey T L, 1994, Proc Int Conf Intell Syst Mol Biol, V2, P28
[5]   SELECTION OF DNA-BINDING SITES BY REGULATORY PROTEINS - STATISTICAL-MECHANICAL THEORY AND APPLICATION TO OPERATORS AND PROMOTERS [J].
BERG, OG ;
VONHIPPEL, PH .
JOURNAL OF MOLECULAR BIOLOGY, 1987, 193 (04) :723-743
[6]   Variation in homeodomain DNA binding revealed by high-resolution analysis of sequence preferences [J].
Berger, Michael F. ;
Badis, Gwenael ;
Gehrke, Andrew R. ;
Talukder, Shaheynoor ;
Philippakis, Anthony A. ;
Pena-Castillo, Lourdes ;
Alleyne, Trevis M. ;
Mnaimneh, Sanie ;
Botvinnik, Olga B. ;
Chan, Esther T. ;
Khalid, Faiqua ;
Zhang, Wen ;
Newburger, Daniel ;
Jaeger, Savina A. ;
Morris, Quaid D. ;
Bulyk, Martha L. ;
Hughes, Timothy R. .
CELL, 2008, 133 (07) :1266-1276
[7]   Compact, universal DNA microarrays to comprehensively determine transcription-factor binding site specificities [J].
Berger, Michael F. ;
Philippakis, Anthony A. ;
Qureshi, Aaron M. ;
He, Fangxue S. ;
Estep, Preston W., III ;
Bulyk, Martha L. .
NATURE BIOTECHNOLOGY, 2006, 24 (11) :1429-1435
[8]   Discovering gapped binding sites of yeast transcription factors [J].
Chen, Chien-Yu ;
Tsai, Huai-Kuang ;
Hsu, Chen-Ming ;
Chen, Mei-Ju May ;
Hung, Hao-Geng ;
Huang, Grace Tzu-Wei ;
Li, Wen-Hsiung .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2008, 105 (07) :2527-2532
[9]   RankMotif++: a motif-search algorithm that accounts for relative ranks of K-mers in binding transcription factors [J].
Chen, Xiaoyu ;
Hughes, Timothy R. ;
Morris, Quaid .
BIOINFORMATICS, 2007, 23 (13) :I72-I79
[10]   WebLogo: A sequence logo generator [J].
Crooks, GE ;
Hon, G ;
Chandonia, JM ;
Brenner, SE .
GENOME RESEARCH, 2004, 14 (06) :1188-1190