REAPR: a universal tool for genome assembly evaluation

被引:287
作者
Hunt, Martin [1 ]
Kikuchi, Taisei [1 ,2 ]
Sanders, Mandy [1 ]
Newbold, Chris [1 ,3 ]
Berriman, Matthew [1 ]
Otto, Thomas D. [1 ]
机构
[1] Wellcome Trust Sanger Inst, Cambridge CB10 1SA, England
[2] Miyazaki Univ, Fac Med, Dept Infect Dis, Div Parasitol, Miyazaki 8891692, Japan
[3] Univ Oxford, John Radcliffe Hosp, Weatherall Inst Mol Med, Oxford OX3 9DS, England
基金
英国惠康基金;
关键词
Genome assembly; validation; evaluation; DRAFT ASSEMBLIES; SEQUENCE; ARTEMIS; DNA;
D O I
10.1186/gb-2013-14-5-r47
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Methods to reliably assess the accuracy of genome sequence data are lacking. Currently completeness is only described qualitatively and mis-assemblies are overlooked. Here we present REAPR, a tool that precisely identifies errors in genome assemblies without the need for a reference sequence. We have validated REAPR on complete genomes or de novo assemblies from bacteria, malaria and Caenorhabditis elegans, and demonstrate that 86% and 82% of the human and mouse reference genomes are error-free, respectively. When applied to an ongoing genome project, REAPR provides corrected assembly statistics allowing the quantitative comparison of multiple assemblies. REAPR is available at http://www.sanger.ac.uk/resources/software/reapr/.
引用
收藏
页数:10
相关论文
共 33 条
[1]   Limitations of next-generation genome sequence assembly [J].
Alkan, Can ;
Sajjadian, Saba ;
Eichler, Evan E. .
NATURE METHODS, 2011, 8 (01) :61-65
[2]  
[Anonymous], 2010, R LANG ENV STAT COMP
[3]   BamTools: a C++ API and toolkit for analyzing and managing BAM files [J].
Barnett, Derek W. ;
Garrison, Erik K. ;
Quinlan, Aaron R. ;
Stroemberg, Michael P. ;
Marth, Gabor T. .
BIOINFORMATICS, 2011, 27 (12) :1691-1692
[4]   Toward almost closed genomes with GapFiller [J].
Boetzer, Marten ;
Pirovano, Walter .
GENOME BIOLOGY, 2012, 13 (06)
[5]   Scaffolding pre-assembled contigs using SSPACE [J].
Boetzer, Marten ;
Henkel, Christiaan V. ;
Jansen, Hans J. ;
Butler, Derek ;
Pirovano, Walter .
BIOINFORMATICS, 2011, 27 (04) :578-579
[6]   Artemis: an integrated platform for visualization and analysis of high-throughput sequence-based experimental data [J].
Carver, Tim ;
Harris, Simon R. ;
Berriman, Matthew ;
Parkhill, Julian ;
McQuillan, Jacqueline A. .
BIOINFORMATICS, 2012, 28 (04) :464-469
[7]   ACT: the Artemis comparison tool [J].
Carver, TJ ;
Rutherford, KM ;
Berriman, M ;
Rajandream, MA ;
Barrell, BG ;
Parkhill, J .
BIOINFORMATICS, 2005, 21 (16) :3422-3423
[8]   Genome Project Standards in a New Era of Sequencing [J].
Chain, P. S. G. ;
Grafham, D. V. ;
Fulton, R. S. ;
FitzGerald, M. G. ;
Hostetler, J. ;
Muzny, D. ;
Ali, J. ;
Birren, B. ;
Bruce, D. C. ;
Buhay, C. ;
Cole, J. R. ;
Ding, Y. ;
Dugan, S. ;
Field, D. ;
Garrity, G. M. ;
Gibbs, R. ;
Graves, T. ;
Han, C. S. ;
Harrison, S. H. ;
Highlander, S. ;
Hugenholtz, P. ;
Khouri, H. M. ;
Kodira, C. D. ;
Kolker, E. ;
Kyrpides, N. C. ;
Lang, D. ;
Lapidus, A. ;
Malfatti, S. A. ;
Markowitz, V. ;
Metha, T. ;
Nelson, K. E. ;
Parkhill, J. ;
Pitluck, S. ;
Qin, X. ;
Read, T. D. ;
Schmutz, J. ;
Sozhamannan, S. ;
Sterk, P. ;
Strausberg, R. L. ;
Sutton, G. ;
Thomson, N. R. ;
Tiedje, J. M. ;
Weinstock, G. ;
Wollam, A. ;
Detter, J. C. .
SCIENCE, 2009, 326 (5950) :236-237
[9]   ALE: a generic assembly likelihood evaluation framework for assessing the accuracy of genome and metagenome assemblies [J].
Clark, Scott C. ;
Egan, Rob ;
Frazier, Peter I. ;
Wang, Zhong .
BIOINFORMATICS, 2013, 29 (04) :435-443
[10]   Assemblathon 1: A competitive assessment of de novo short read assembly methods [J].
Earl, Dent ;
Bradnam, Keith ;
St John, John ;
Darling, Aaron ;
Lin, Dawei ;
Fass, Joseph ;
Hung On Ken Yu ;
Buffalo, Vince ;
Zerbino, Daniel R. ;
Diekhans, Mark ;
Ngan Nguyen ;
Ariyaratne, Pramila Nuwantha ;
Sung, Wing-Kin ;
Ning, Zemin ;
Haimel, Matthias ;
Simpson, Jared T. ;
Fonseca, Nuno A. ;
Birol, Inanc ;
Docking, T. Roderick ;
Ho, Isaac Y. ;
Rokhsar, Daniel S. ;
Chikhi, Rayan ;
Lavenier, Dominique ;
Chapuis, Guillaume ;
Naquin, Delphine ;
Maillet, Nicolas ;
Schatz, Michael C. ;
Kelley, David R. ;
Phillippy, Adam M. ;
Koren, Sergey ;
Yang, Shiaw-Pyng ;
Wu, Wei ;
Chou, Wen-Chi ;
Srivastava, Anuj ;
Shaw, Timothy I. ;
Ruby, J. Graham ;
Skewes-Cox, Peter ;
Betegon, Miguel ;
Dimon, Michelle T. ;
Solovyev, Victor ;
Seledtsov, Igor ;
Kosarev, Petr ;
Vorobyev, Denis ;
Ramirez-Gonzalez, Ricardo ;
Leggett, Richard ;
MacLean, Dan ;
Xia, Fangfang ;
Luo, Ruibang ;
Li, Zhenyu ;
Xie, Yinlong .
GENOME RESEARCH, 2011, 21 (12) :2224-2241