Benchmarking Bacterial Promoter Prediction Tools: Potentialities and Limitations

被引:26
作者
Anzolini Cassiano, Murilo Henrique [1 ]
Silva-Rocha, Rafael [1 ]
机构
[1] FMRP Univ Sao Paulo, Ribeirao Preto, SP, Brazil
基金
巴西圣保罗研究基金会;
关键词
promoter prediction; bacterial promoters; cis-regulatory elements; bioinformatics; ESCHERICHIA-COLI; TRANSCRIPTION INITIATION; RECOGNITION; SIGMA(70); SEQUENCE; ELEMENTS;
D O I
10.1128/mSystems.00439-20
中图分类号
Q93 [微生物学];
学科分类号
071005 ; 100705 ;
摘要
The promoter region is a key element required for the production of RNA in bacteria. While new high-throughput technology allows massively parallel mapping of promoter elements, we still mainly rely on bioinformatics tools to predict such elements in bacterial genomes. Additionally, despite many different prediction tools having become popular to identify bacterial promoters, no systematic comparison of such tools has been performed. Here, we performed a systematic comparison between several widely used promoter prediction tools (BPROM, bTSSfinder, BacPP, CNNProm, IBBP, Virtual Footprint, iPro70-FMWin, 70ProPred, iPromoter-2L, and MULTiPly) using well-defined sequence data sets and standardized metrics to determine how well those tools performed related to each other. For this, we used data sets of experimentally validated promoters from Escherichia coli and a control data set composed of randomly generated sequences with similar nucleotide distributions. We compared the performance of the tools using metrics such as specificity, sensitivity, accuracy, and Matthews correlation coefficient (MCC). We show that the widely used BPROM presented the worse performance among the compared tools, while four tools (CNNProm, iPro70-FMWin, 70ProPred, and iPromoter-2L) offered high predictive power. Of these tools, iPro70-FMWin exhibited the best results for most of the metrics used. We present here some potentials and limitations of available tools, and we hope that future work can build upon our effort to systematically characterize this useful class of bioinformatics tools. IMPORTANCE The correct mapping of promoter elements is a crucial step in microbial genomics. Also, when combining new DNA elements into synthetic sequences, predicting the potential generation of new promoter sequences is critical. Over the last years, many bioinformatics tools have been created to allow users to predict promoter elements in a sequence or genome of interest. Here, we assess the predictive power of some of the main prediction tools available using well-defined promoter data sets. Using Escherichia coli as a model organism, we demonstrated that while some tools are biased toward AT-rich sequences, others are very efficient in identifying real promoters with low false-negative rates. We hope the potentials and limitations presented here will help the microbiology community to choose promoter prediction tools among many available alternatives.
引用
收藏
页数:16
相关论文
共 62 条
  • [1] Assessing the Effects of Data Selection and Representation on the Development of Reliable E. coli Sigma 70 Promoter Region Predictors
    Abbas, Mostafa M.
    Mohie-Eldin, Mostafa M.
    El-Manzalawy, Yasser
    [J]. PLOS ONE, 2015, 10 (03):
  • [2] Systematic approach for dissecting the molecular mechanisms of transcriptional regulation in bacteria
    Belliveau, Nathan M.
    Barnes, Stephanie L.
    Ireland, William T.
    Jones, Daniel L.
    Sweredoski, Michael J.
    Moradian, Annie
    Hess, Sonja
    Kinney, Justin B.
    Phillips, Rob
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2018, 115 (21) : E4796 - E4805
  • [3] Comparative genomic structure of prokaryotes
    Bentley, SD
    Parkhill, J
    [J]. ANNUAL REVIEW OF GENETICS, 2004, 38 : 771 - 792
  • [4] The nucleotide composition of microbial genomes indicates differential patterns of selection on core and accessory genomes
    Bohlin, Jon
    Eldholm, Vegard
    Pettersson, John H. O.
    Brynildsrud, Ola
    Snipen, Lars
    [J]. BMC GENOMICS, 2017, 18
  • [5] Tuning Promoter Strength through RNA Polymerase Binding Site Design in Escherichia coli
    Brewster, Robert C.
    Jones, Daniel L.
    Phillips, Rob
    [J]. PLOS COMPUTATIONAL BIOLOGY, 2012, 8 (12)
  • [6] The regulation of bacterial transcription initiation
    Browning, DF
    Busby, SJW
    [J]. NATURE REVIEWS MICROBIOLOGY, 2004, 2 (01) : 57 - 65
  • [7] Local and global regulation of transcription initiation in bacteria
    Browning, Douglas F.
    Busby, Stephen J. W.
    [J]. NATURE REVIEWS MICROBIOLOGY, 2016, 14 (10) : 638 - 650
  • [8] Next-Generation Machine Learning for Biological Networks
    Camacho, Diogo M.
    Collins, Katherine M.
    Powers, Rani K.
    Costello, James C.
    Collins, James J.
    [J]. CELL, 2018, 173 (07) : 1581 - 1592
  • [9] The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation
    Chicco, Davide
    Jurman, Giuseppe
    [J]. BMC GENOMICS, 2020, 21 (01)
  • [10] Analysis of the nucleotide content of Escherichia coli promoter sequences related to the alternative sigma factors
    Dall'Alba, Gabriel
    Casa, Pedro Lenz
    Notari, Daniel Luis
    Adami, Andre Gustavo
    Echeverrigaray, Sergio
    de Avila e Silva, Scheila
    [J]. JOURNAL OF MOLECULAR RECOGNITION, 2019, 32 (05)