Computational enhancer prediction: evaluation and improvements

被引:13
作者
Asma, Hasiba [1 ]
Halfon, Marc S. [1 ,2 ,3 ,4 ,5 ,6 ,7 ]
机构
[1] SUNY Buffalo, Program Genet Genom & Bioinformat, 701 Ellicott St, Buffalo, NY 14203 USA
[2] SUNY Buffalo, Dept Biochem, 701 Ellicott St, Buffalo, NY 14203 USA
[3] SUNY Buffalo, Dept Biol Sci, 701 Ellicott St, Buffalo, NY 14203 USA
[4] SUNY Buffalo, Dept Biomed Informat, 701 Ellicott St, Buffalo, NY 14203 USA
[5] NY State Ctr Excellence Bioinformat & Life Sci, 701 Ellicott St, Buffalo, NY 14203 USA
[6] Roswell Pk Comprehens Canc Ctr, Mol & Cellular Biol Dept, Buffalo, NY 14263 USA
[7] Roswell Pk Comprehens Canc Ctr, Program Canc Genet, Buffalo, NY 14263 USA
关键词
CIS-REGULATORY MODULES; DISCOVERY;
D O I
10.1186/s12859-019-2781-x
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Identifying transcriptional enhancers and other cis-regulatory modules (CRMs) is an important goal of post-sequencing genome annotation. Computational approaches provide a useful complement to empirical methods for CRM discovery, but it is critical that we develop effective means to evaluate their performance in terms of estimating their sensitivity and specificity. Results: We introduce here pCRMeval, a pipeline for in silico evaluation of any enhancer prediction tools that are flexible enough to be applied to the Drosophila melanogaster genome. pCRMeval compares the result of predictions with the extensive existing knowledge of experimentally-validated Drosophila CRMs in order to estimate the precision and relative sensitivity of the prediction method. In the case of supervised prediction methods-when training data composed of validated CRMs are used-pCRMeval can also assess the sensitivity of specific training sets. We demonstrate the utility of pCRMeval through evaluation of our SCRMshaw CRM prediction method and training data. By measuring the impact of different parameters on SCRMshaw performance, as assessed by pCRMeval, we develop a more robust version of SCRMshaw, SCRMshaw_HD, that improves the number of predictions while maintaining sensitivity and specificity. Our analysis also demonstrates that SCRMshaw_HD, when applied to increasingly less well-assembled genomes, maintains its strong predictive power with only a minor drop-off in performance. Conclusion: Our pCRMeval pipeline provides a general framework for evaluation that can be applied to any CRM prediction method, particularly a supervised method. While we make use of it here primarily to test and improve a particular method for CRM prediction, SCRMshaw, pCRMeval should provide a valuable platform to the research community not only for evaluating individual methods, but also for comparing between competing methods.
引用
收藏
页数:15
相关论文
共 20 条
[1]  
[Anonymous], 2014, TECHNOLOGY INNOVATIO
[2]  
Arbel H, 2018, P NATL ACAD SCI USA
[3]  
Carroll S.B., 2001, MOL GENETICS EVOLUTI
[4]  
Davidson E.H, 2006, The Regulatory Genome: Gene Regulatory Networks in Development and Evolution
[5]   Computational discovery of cis-regulatory modules in Drosophila without prior knowledge of motifs [J].
Ivan, Andra ;
Halfon, Marc S. ;
Sinha, Saurabh .
GENOME BIOLOGY, 2008, 9 (01)
[6]   Motif-Blind, Genome-Wide Discovery of cis-Regulatory Modules in Drosophila and Mouse [J].
Kantorovitz, Miriam R. ;
Kazemian, Majid ;
Kinston, Sarah ;
Miranda-Saavedra, Diego ;
Zhu, Qiyun ;
Robinson, Gene E. ;
Goettgens, Berthold ;
Halfon, Marc S. ;
Sinha, Saurabh .
DEVELOPMENTAL CELL, 2009, 17 (04) :568-579
[7]  
Kazemian M, 2019, METHODS MOL BIOL, V1858, P117, DOI 10.1007/978-1-4939-8775-7_10
[8]   Evidence for Deep Regulatory Similarities in Early Developmental Programs across Highly Diverged Insects [J].
Kazemian, Majid ;
Suryamohan, Kushal ;
Chen, Jia-Yu ;
Zhang, Yinan ;
Samee, Md Abul Hassan ;
Halfon, Marc S. ;
Sinha, Saurabh .
GENOME BIOLOGY AND EVOLUTION, 2014, 6 (09) :2301-2320
[9]   Improved accuracy of supervised CRM discovery with interpolated Markov models and cross-species comparison [J].
Kazemian, Majid ;
Zhu, Qiyun ;
Halfon, Marc S. ;
Sinha, Saurabh .
NUCLEIC ACIDS RESEARCH, 2011, 39 (22) :9463-9472
[10]   Progress and challenges in bioinformatics approaches for enhancer identification [J].
Kleftogiannis, Dimitrios ;
Kalnis, Panos ;
Bajic, Vladimir B. .
BRIEFINGS IN BIOINFORMATICS, 2016, 17 (06) :967-979