Searching for transcription factor binding sites in vector spaces

被引:7
作者
Lee, Chih [1 ]
Huang, Chun-Hsi [1 ]
机构
[1] Univ Connecticut, Dept Comp Sci & Engn, Storrs, CT 06269 USA
来源
BMC BIOINFORMATICS | 2012年 / 13卷
基金
美国国家科学基金会;
关键词
DISCOVERY; SEQUENCES; DATABASE; MODEL; TOOL;
D O I
10.1186/1471-2105-13-215
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Computational approaches to transcription factor binding site identification have been actively researched in the past decade. Learning from known binding sites, new binding sites of a transcription factor in unannotated sequences can be identified. A number of search methods have been introduced over the years. However, one can rarely find one single method that performs the best on all the transcription factors. Instead, to identify the best method for a particular transcription factor, one usually has to compare a handful of methods. Hence, it is highly desirable for a method to perform automatic optimization for individual transcription factors. Results: We proposed to search for transcription factor binding sites in vector spaces. This framework allows us to identify the best method for each individual transcription factor. We further introduced two novel methods, the negative-to-positive vector (NPV) and optimal discriminating vector (ODV) methods, to construct query vectors to search for binding sites in vector spaces. Extensive cross-validation experiments showed that the proposed methods significantly outperformed the ungapped likelihood under positional background method, a state-of-the-art method, and the widely-used position-specific scoring matrix method. We further demonstrated that motif subtypes of a TF can be readily identified in this framework and two variants called the kNPV and kODV methods benefited significantly from motif subtype identification. Finally, independent validation on ChIP-seq data showed that the ODV and NPV methods significantly outperformed the other compared methods. Conclusions: We conclude that the proposed framework is highly flexible. It enables the two novel methods to automatically identify a TF-specific subspace to search for binding sites. Implementations are available as source code at: http://biogrid.engr.uconn.edu/tfbs_search/.
引用
收藏
页数:12
相关论文
共 37 条
  • [1] Minimotif Miner: a tool for investigating protein function
    Balla, S
    Thapar, V
    Verma, S
    Luong, T
    Faghri, T
    Huang, CH
    Rajasekaran, S
    del Campo, JJ
    Shinn, JH
    Mohler, WA
    Maciejewski, MW
    Gryk, MR
    Piccirillo, B
    Schiller, SR
    Schiller, MR
    [J]. NATURE METHODS, 2006, 3 (03) : 175 - 177
  • [2] Barash Y., 2001, Algorithms in Bioinformatics. First International Workshop, WABI 2001. Proceedings (Lecture Notes in Computer Science Vol.2149), P278
  • [3] Bertsekas DP., 2008, NONLINEAR PROGRAMMIN
  • [4] Buhler Jeremy., 2001, J COMPUT BIOL, P69
  • [5] P-Match: transcription factor binding site search by combining patterns and weight matrices
    Chekmenev, DS
    Haid, C
    Kel, AE
    [J]. NUCLEIC ACIDS RESEARCH, 2005, 33 : W432 - W437
  • [6] Open source clustering software
    de Hoon, MJL
    Imoto, S
    Nolan, J
    Miyano, S
    [J]. BIOINFORMATICS, 2004, 20 (09) : 1453 - 1454
  • [7] An introduction to ROC analysis
    Fawcett, Tom
    [J]. PATTERN RECOGNITION LETTERS, 2006, 27 (08) : 861 - 874
  • [8] The UCSC Genome Browser database: update 2011
    Fujita, Pauline A.
    Rhead, Brooke
    Zweig, Ann S.
    Hinrichs, Angie S.
    Karolchik, Donna
    Cline, Melissa S.
    Goldman, Mary
    Barber, Galt P.
    Clawson, Hiram
    Coelho, Antonio
    Diekhans, Mark
    Dreszer, Timothy R.
    Giardine, Belinda M.
    Harte, Rachel A.
    Hillman-Jackson, Jennifer
    Hsu, Fan
    Kirkup, Vanessa
    Kuhn, Robert M.
    Learned, Katrina
    Li, Chin H.
    Meyer, Laurence R.
    Pohl, Andy
    Raney, Brian J.
    Rosenbloom, Kate R.
    Smith, Kayla E.
    Haussler, David
    Kent, W. James
    [J]. NUCLEIC ACIDS RESEARCH, 2011, 39 : D876 - D882
  • [9] RegulonDB (version 6.0):: gene regulation model of Escherichia coli K-12 beyond transcription, active (experimental) annotated promoters and Textpresso navigation
    Gama-Castro, Socorro
    Jimenez-Jacinto, Veronica
    Peralta-Gil, Martin
    Santos-Zavaleta, Alberto
    Penaloza-Spinola, Monica I.
    Contreras-Moreira, Bruno
    Segura-Salazar, Juan
    Muniz-Rascado, Luis
    Martinez-Flores, Irma
    Salgado, Heladia
    Bonavides-Martinez, Cesar
    Abreu-Goodger, Cei
    Rodriguez-Penagos, Carlos
    Miranda-Rios, Juan
    Morett, Enrique
    Merino, Enrique
    Huerta, Araceli M.
    Trevino-Quintanilla, Luis
    Collado-Vides, Julio
    [J]. NUCLEIC ACIDS RESEARCH, 2008, 36 : D120 - D124
  • [10] Context-specific independence mixture modeling for positional weight matrices
    Georgi, Benjamin
    Schliep, Alexander
    [J]. BIOINFORMATICS, 2006, 22 (14) : E166 - E173