High-throughput SELEX-SAGE method for quantitative modeling of transcription-factor binding sites

被引:157
作者
Roulet, E
Busso, S
Camargo, AA
Simpson, AJG
Mermod, N
Bucher, P [1 ]
机构
[1] Swiss Inst Expt Canc Res, Swiss Inst Bioinformat, CH-1066 Epalinges, Switzerland
[2] EPFL, UNIL, Lab Mol Biotechnol, Ctr Biotechnol, CH-1015 Lausanne, Switzerland
[3] Univ Lausanne, Inst Biol Anim, CH-1015 Lausanne, Switzerland
[4] Ludwig Inst Canc Res, Canc Genet Lab, BR-01509010 Sao Paulo, Brazil
关键词
D O I
10.1038/nbt718
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
The ability to determine the location and relative strength of all transcription-factor binding sites in a genome is important both for a comprehensive understanding of gene regulation and for effective promoter engineering in biotechnological applications. Here we present a bioinformatically driven experimental method to accurately define the DNA-binding sequence specificity of transcription factors. A generalized profile(1) was used as a predictive quantitative model for binding sites, and its parameters were estimated from in vitro-selected ligands using standard hidden Markov model training algorithms(2,3). Computer simulations showed that several thousand low- to medium-affinity sequences are required to generate a profile of desired accuracy. To produce data on this scale, we applied high-throughput genomics methods to the biochemical problem addressed here. A method combining systematic evolution of ligands by exponential enrichment (SELEX)(4) and serial analysis of gene expression (SAGE)(5) protocols was coupled to an automated quality-controlled sequence extraction procedure based on Phred quality scores(6). This allowed the sequencing of a database of more than 10,000 potential DNA ligands for the CTF/NFI transcription factor. The resulting binding-site model defines the sequence specificity of this protein with a high degree of accuracy not achieved earlier and thereby makes it possible to identify previously unknown regulatory sequences in genomic DNA. A covariance analysis of the selected sites revealed non-independent base preferences at different nucleotide positions, providing insight into the binding mechanism.
引用
收藏
页码:831 / 835
页数:5
相关论文
共 18 条
[1]   SELECTION OF DNA-BINDING SITES BY REGULATORY PROTEINS - STATISTICAL-MECHANICAL THEORY AND APPLICATION TO OPERATORS AND PROMOTERS [J].
BERG, OG ;
VONHIPPEL, PH .
JOURNAL OF MOLECULAR BIOLOGY, 1987, 193 (04) :723-743
[2]   A flexible motif search technique based on generalized profiles [J].
Bucher, P ;
Karplus, K ;
Moeri, N ;
Hofmann, K .
COMPUTERS & CHEMISTRY, 1996, 20 (01) :3-23
[3]   Prediction of complete gene structures in human genomic DNA [J].
Burge, C ;
Karlin, S .
JOURNAL OF MOLECULAR BIOLOGY, 1997, 268 (01) :78-94
[4]  
Durbin R., 1998, BIOL SEQUENCE ANAL P
[5]   DNA binding specificity of different STAT proteins -: Comparison of in vitro specificity with natural target sites [J].
Ehret, GB ;
Reichenbach, P ;
Schindler, U ;
Horvath, CM ;
Fritz, S ;
Nabholz, M ;
Bucher, P .
JOURNAL OF BIOLOGICAL CHEMISTRY, 2001, 276 (09) :6675-6688
[6]   Base-calling of automated sequencer traces using phred.: I.: Accuracy assessment [J].
Ewing, B ;
Hillier, L ;
Wendl, MC ;
Green, P .
GENOME RESEARCH, 1998, 8 (03) :175-185
[7]   Quantitative specificity of the Mnt repressor [J].
Fields, DS ;
He, YY ;
AlUzri, AY ;
Stormo, GD .
JOURNAL OF MOLECULAR BIOLOGY, 1997, 271 (02) :178-194
[8]   In vitro selection of integration host factor binding sites [J].
Goodman, SD ;
Velten, NJ ;
Gao, QA ;
Robinson, S ;
Segall, AM .
JOURNAL OF BACTERIOLOGY, 1999, 181 (10) :3246-3255
[9]  
Hughey R, 1996, COMPUT APPL BIOSCI, V12, P95
[10]   ALL YOU WANTED TO KNOW ABOUT SELEX [J].
KLUG, SJ ;
FAMULOK, M .
MOLECULAR BIOLOGY REPORTS, 1994, 20 (02) :97-107