FLASH: fast length adjustment of short reads to improve genome assemblies

被引:11087
作者
Magoc, Tanja [1 ]
Salzberg, Steven L. [1 ]
机构
[1] Johns Hopkins Univ, Sch Med, McKusick Nathans Inst Genet Med, Baltimore, MD 21205 USA
基金
美国国家卫生研究院;
关键词
D O I
10.1093/bioinformatics/btr507
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Next-generation sequencing technologies generate very large numbers of short reads. Even with very deep genome coverage, short read lengths cause problems in de novo assemblies. The use of paired-end libraries with a fragment size shorter than twice the read length provides an opportunity to generate much longer reads by overlapping and merging read pairs before assembling a genome. Results: We present FLASH, a fast computational tool to extend the length of short reads by overlapping paired-end reads from fragment libraries that are sufficiently short. We tested the correctness of the tool on one million simulated read pairs, and we then applied it as a pre-processor for genome assemblies of Illumina reads from the bacterium Staphylococcus aureus and human chromosome 14. FLASH correctly extended and merged reads > 99% of the time on simulated reads with an error rate of < 1%. With adequately set parameters, FLASH correctly merged reads over 90% of the time even when the reads contained up to 5% errors. When FLASH was used to extend reads prior to assembly, the resulting assemblies had substantially greater N50 lengths for both contigs and scaffolds.
引用
收藏
页码:2957 / 2963
页数:7
相关论文
共 9 条
  • [1] High-quality draft assemblies of mammalian genomes from massively parallel sequence data
    Gnerre, Sante
    MacCallum, Iain
    Przybylski, Dariusz
    Ribeiro, Filipe J.
    Burton, Joshua N.
    Walker, Bruce J.
    Sharpe, Ted
    Hall, Giles
    Shea, Terrance P.
    Sykes, Sean
    Berlin, Aaron M.
    Aird, Daniel
    Costello, Maura
    Daza, Riza
    Williams, Louise
    Nicol, Robert
    Gnirke, Andreas
    Nusbaum, Chad
    Lander, Eric S.
    Jaffe, David B.
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2011, 108 (04) : 1513 - 1518
  • [2] Quake: quality-aware detection and correction of sequencing errors
    Kelley, David R.
    Schatz, Michael C.
    Salzberg, Steven L.
    [J]. GENOME BIOLOGY, 2010, 11 (11):
  • [3] Versatile and open software for comparing large genomes
    Kurtz, S
    Phillippy, A
    Delcher, AL
    Smoot, M
    Shumway, M
    Antonescu, C
    Salzberg, SL
    [J]. GENOME BIOLOGY, 2004, 5 (02)
  • [4] Ultrafast and memory-efficient alignment of short DNA sequences to the human genome
    Langmead, Ben
    Trapnell, Cole
    Pop, Mihai
    Salzberg, Steven L.
    [J]. GENOME BIOLOGY, 2009, 10 (03):
  • [5] Fast and accurate short read alignment with Burrows-Wheeler transform
    Li, Heng
    Durbin, Richard
    [J]. BIOINFORMATICS, 2009, 25 (14) : 1754 - 1760
  • [6] De novo assembly of human genomes with massively parallel short read sequencing
    Li, Ruiqiang
    Zhu, Hongmei
    Ruan, Jue
    Qian, Wubin
    Fang, Xiaodong
    Shi, Zhongbin
    Li, Yingrui
    Li, Shengting
    Shan, Gao
    Kristiansen, Karsten
    Li, Songgang
    Yang, Huanming
    Wang, Jian
    Wang, Jun
    [J]. GENOME RESEARCH, 2010, 20 (02) : 265 - 272
  • [7] ALLPATHS 2: small genomes assembled accurately and with high continuity from short paired reads
    MacCallum, Iain
    Przybylski, Dariusz
    Gnerre, Sante
    Burton, Joshua
    Shlyakhter, Ilya
    Gnirke, Andreas
    Malek, Joel
    McKernan, Kevin
    Ranade, Swati
    Shea, Terrance P.
    Williams, Louise
    Young, Sarah
    Nusbaum, Chad
    Jaffe, David B.
    [J]. GENOME BIOLOGY, 2009, 10 (10):
  • [8] Aggressive assembly of pyrosequencing reads with mates
    Miller, Jason R.
    Delcher, Arthur L.
    Koren, Sergey
    Venter, Eli
    Walenz, Brian P.
    Brownley, Anushka
    Johnson, Justin
    Li, Kelvin
    Mobarry, Clark
    Sutton, Granger
    [J]. BIOINFORMATICS, 2008, 24 (24) : 2818 - 2824
  • [9] Unlocking Short Read Sequencing for Metagenomics
    Rodrigue, Sebastien
    Materna, Arne C.
    Timberlake, Sonia C.
    Blackburn, Matthew C.
    Malmstrom, Rex R.
    Alm, Eric J.
    Chisholm, Sallie W.
    [J]. PLOS ONE, 2010, 5 (07):