Improving promoter prediction Improving promoter prediction for the NNPP2.2 algorithm:: a case study using Escherichia coli DNA sequences

被引:59
作者
Burden, S [1 ]
Lin, YX
Zhang, R
机构
[1] Univ Wollongong, Dept Appl Math & Stat, Wollongong, NSW 2522, Australia
[2] Univ Wollongong, Dept Biol Sci, Wollongong, NSW 2522, Australia
关键词
D O I
10.1093/bioinformatics/bti047
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Although a great deal of research has been undertaken in the area of promoter prediction, prediction techniques are still not fully developed. Many algorithms tend to exhibit poor specificity, generating many false positives, or poor sensitivity. The neural network prediction program NNPP2.2 is one such example. Results: To improve the NNPP2.2 prediction technique, the distance between the transcription start site (TSS) associated with the promoter and the translation start site (TLS) of the subsequent gene coding region has been studied for Escherichia coli K12 bacteria. An empirical probability distribution that is consistent for all E.coli promoters has been established. This information is combined with the results from NNPP2.2 to create a new technique called TLS-NNPP, which improves the specificity of promoter prediction. The technique is shown to be effective using E.coli DNA sequences, however, it is applicable to any organism for which a set of promoters has been experimentally defined.
引用
收藏
页码:601 / 607
页数:7
相关论文
共 28 条
  • [11] Periodical distribution of transcription factor sites in promoter regions and connection with chromatin structure
    Ioshikhes, I
    Trifonov, EN
    Zhang, MQ
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1999, 96 (06) : 2891 - 2895
  • [12] Artificial neural networks for prediction of mycobacterial promoter sequences
    Kalate, RN
    Tambe, SS
    Kulkarni, BD
    [J]. COMPUTATIONAL BIOLOGY AND CHEMISTRY, 2003, 27 (06) : 555 - 564
  • [13] The EcoCyc database
    Karp, PD
    Riley, M
    Saier, M
    Paulsen, IT
    Collado-Vides, J
    Paley, SM
    Pellegrini-Toole, A
    Bonavides, C
    Gama-Castro, S
    [J]. NUCLEIC ACIDS RESEARCH, 2002, 30 (01) : 56 - 58
  • [14] Promoter2.0: for the recognition of PolII promoter sequences
    Knudsen, S
    [J]. BIOINFORMATICS, 1999, 15 (05) : 356 - 361
  • [15] Consensus promoter identification in the human genome utilizing expressed gene markers and gene modeling
    Liu, RX
    States, DJ
    [J]. GENOME RESEARCH, 2002, 12 (03) : 462 - 469
  • [16] MA Q, 1999, IASTED INT C ART INT, P301
  • [17] DNA sequence classification via an expectation maximization algorithm and neural networks: A case study
    Ma, QC
    Wang, JTL
    Shasha, D
    Wu, CH
    [J]. IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART C-APPLICATIONS AND REVIEWS, 2001, 31 (04): : 468 - 475
  • [18] Ohler U, 2000, Pac Symp Biocomput, P380
  • [19] Interpolated Markov chains for eukaryotic promoter recognition
    Ohler, U
    Harbeck, S
    Niemann, H
    Nöth, M
    Reese, MG
    [J]. BIOINFORMATICS, 1999, 15 (05) : 362 - 369
  • [20] Non-canonical sequence elements in the promoter structure. Cluster analysis of promoters recognized by Escherichia coli RNA polymerase
    Ozoline, ON
    Deev, AA
    Arkhipova, MV
    [J]. NUCLEIC ACIDS RESEARCH, 1997, 25 (23) : 4703 - 4709