GenomeScope: fast reference-free genome profiling from short reads

被引:1246
作者
Vurture, Gregory W. [1 ]
Sedlazeck, Fritz J. [2 ,3 ]
Nattestad, Maria [1 ]
Underwood, Charles J. [1 ]
Fang, Han [1 ,4 ]
Gurtowski, James [1 ]
Schatz, Michael C. [1 ,2 ,3 ]
机构
[1] Cold Spring Harbor Lab, Simons Ctr Quantitat Biol, POB 100, Cold Spring Harbor, NY 11724 USA
[2] Johns Hopkins Univ, Dept Comp Sci, Baltimore, MD 21218 USA
[3] Johns Hopkins Univ, Dept Biol, Baltimore, MD 21218 USA
[4] SUNY Stony Brook, Dept Appl Math & Stat, Stony Brook, NY 11794 USA
基金
美国国家卫生研究院; 美国国家科学基金会;
关键词
QUALITY;
D O I
10.1093/bioinformatics/btx153
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
GenomeScope is an open-source web tool to rapidly estimate the overall characteristics of a genome, including genome size, heterozygosity rate and repeat content from unprocessed short reads. These features are essential for studying genome evolution, and help to choose parameters for downstream analysis. We demonstrate its accuracy on 324 simulated and 16 real datasets with a wide range in genome sizes, heterozygosity levels and error rates. Availability and Implementation: http://genomescope. org, https://github. com/schatzlab/genome scope. git. Contact: mschatz@ jhu. edu Supplementary information: Supplementary data are available at Bioinformatics online.
引用
收藏
页码:2202 / 2204
页数:3
相关论文
共 17 条
[1]  
[Anonymous], 2013, ARXIV
[2]   SPAdes: A New Genome Assembly Algorithm and Its Applications to Single-Cell Sequencing [J].
Bankevich, Anton ;
Nurk, Sergey ;
Antipov, Dmitry ;
Gurevich, Alexey A. ;
Dvorkin, Mikhail ;
Kulikov, Alexander S. ;
Lesin, Valery M. ;
Nikolenko, Sergey I. ;
Son Pham ;
Prjibelski, Andrey D. ;
Pyshkin, Alexey V. ;
Sirotkin, Alexander V. ;
Vyahhi, Nikolay ;
Tesler, Glenn ;
Alekseyev, Max A. ;
Pevzner, Pavel A. .
JOURNAL OF COMPUTATIONAL BIOLOGY, 2012, 19 (05) :455-477
[3]  
Bates D. M., 1988, NONLINEAR REGRESSION, DOI DOI 10.1002/9780470316757
[4]   Informed and automated k-mer size selection for genome assembly [J].
Chikhi, Rayan ;
Medvedev, Paul .
BIOINFORMATICS, 2014, 30 (01) :31-37
[5]   High-quality draft assemblies of mammalian genomes from massively parallel sequence data [J].
Gnerre, Sante ;
MacCallum, Iain ;
Przybylski, Dariusz ;
Ribeiro, Filipe J. ;
Burton, Joshua N. ;
Walker, Bruce J. ;
Sharpe, Ted ;
Hall, Giles ;
Shea, Terrance P. ;
Sykes, Sean ;
Berlin, Aaron M. ;
Aird, Daniel ;
Costello, Maura ;
Daza, Riza ;
Williams, Louise ;
Nicol, Robert ;
Gnirke, Andreas ;
Nusbaum, Chad ;
Lander, Eric S. ;
Jaffe, David B. .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2011, 108 (04) :1513-1518
[6]   Coming of age: ten years of next-generation sequencing technologies [J].
Goodwin, Sara ;
McPherson, John D. ;
McCombie, W. Richard .
NATURE REVIEWS GENETICS, 2016, 17 (06) :333-351
[7]   Efficient de novo assembly of highly heterozygous genomes from whole-genome shotgun short reads [J].
Kajitani, Rei ;
Toshimoto, Kouta ;
Noguchi, Hideki ;
Toyoda, Atsushi ;
Ogura, Yoshitoshi ;
Okuno, Miki ;
Yabana, Mitsuru ;
Harada, Masayuki ;
Nagayasu, Eiji ;
Maruyama, Haruhiko ;
Kohara, Yuji ;
Fujiyama, Asao ;
Hayashi, Tetsuya ;
Itoh, Takehiko .
GENOME RESEARCH, 2014, 24 (08) :1384-1395
[8]   Quake: quality-aware detection and correction of sequencing errors [J].
Kelley, David R. ;
Schatz, Michael C. ;
Salzberg, Steven L. .
GENOME BIOLOGY, 2010, 11 (11)
[9]   The Sequence Alignment/Map format and SAMtools [J].
Li, Heng ;
Handsaker, Bob ;
Wysoker, Alec ;
Fennell, Tim ;
Ruan, Jue ;
Homer, Nils ;
Marth, Gabor ;
Abecasis, Goncalo ;
Durbin, Richard .
BIOINFORMATICS, 2009, 25 (16) :2078-2079
[10]   Estimating the repeat structure and length of DNA sequences using l-tuples [J].
Li, XM ;
Waterman, MS .
GENOME RESEARCH, 2003, 13 (08) :1916-1922