HGA: de novo genome assembly method for bacterial genomes using high coverage short sequencing reads

被引:15
作者
Al-okaily, Anas A. [1 ]
机构
[1] Univ Connecticut, Dept Comp Sci & Engn, Storrs, CT 06269 USA
来源
BMC GENOMICS | 2016年 / 17卷
基金
美国食品与农业研究所;
关键词
Computational genomic; De novo genome assembly; Contigs assembly; SHORT DNA-SEQUENCES; ALGORITHM; GAUGE;
D O I
10.1186/s12864-016-2515-7
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Background: Current high-throughput sequencing technologies generate large numbers of relatively short and error-prone reads, making the de novo assembly problem challenging. Although high quality assemblies can be obtained by assembling multiple paired-end libraries with both short and long insert sizes, the latter are costly to generate. Recently, GAGE-B study showed that a remarkably good assembly quality can be obtained for bacterial genomes by state-of-the-art assemblers run on a single short-insert library with very high coverage. Results: In this paper, we introduce a novel hierarchical genome assembly (HGA) methodology that takes further advantage of such very high coverage by independently assembling disjoint subsets of reads, combining assemblies of the subsets, and finally re-assembling the combined contigs along with the original reads. Conclusions: We empirically evaluated this methodology for 8 leading assemblers using 7 GAGE-B bacterial datasets consisting of 100 bp Illumina HiSeq and 250 bp Illumina MiSeq reads, with coverage ranging from 100x-similar to 200x. The results show that for all evaluated datasets and using most evaluated assemblers (that were used to assemble the disjoint subsets), HGA leads to a significant improvement in the quality of the assembly based on N50 and corrected N50 metrics.
引用
收藏
页数:11
相关论文
共 24 条
  • [1] SPAdes: A New Genome Assembly Algorithm and Its Applications to Single-Cell Sequencing
    Bankevich, Anton
    Nurk, Sergey
    Antipov, Dmitry
    Gurevich, Alexey A.
    Dvorkin, Mikhail
    Kulikov, Alexander S.
    Lesin, Valery M.
    Nikolenko, Sergey I.
    Son Pham
    Prjibelski, Andrey D.
    Pyshkin, Alexey V.
    Sirotkin, Alexander V.
    Vyahhi, Nikolay
    Tesler, Glenn
    Alekseyev, Max A.
    Pevzner, Pavel A.
    [J]. JOURNAL OF COMPUTATIONAL BIOLOGY, 2012, 19 (05) : 455 - 477
  • [2] Plantagora: Modeling Whole Genome Sequencing and Assembly of Plant Genomes
    Barthelson, Roger
    McFarlin, Adam J.
    Rounsley, Steven D.
    Young, Sarah
    [J]. PLOS ONE, 2011, 6 (12):
  • [3] Assemblathon 2: evaluating de novo methods of genome assembly in three vertebrate species
    Bradnam, Keith R.
    Fass, Joseph N.
    Alexandrov, Anton
    Baranay, Paul
    Bechner, Michael
    Birol, Inanc
    Boisvert, Sebastien
    Chapman, Jarrod A.
    Chapuis, Guillaume
    Chikhi, Rayan
    Chitsaz, Hamidreza
    Chou, Wen-Chi
    Corbeil, Jacques
    Del Fabbro, Cristian
    Docking, T. Roderick
    Durbin, Richard
    Earl, Dent
    Emrich, Scott
    Fedotov, Pavel
    Fonseca, Nuno A.
    Ganapathy, Ganeshkumar
    Gibbs, Richard A.
    Gnerre, Sante
    Godzaridis, Elenie
    Goldstein, Steve
    Haimel, Matthias
    Hall, Giles
    Haussler, David
    Hiatt, Joseph B.
    Ho, Isaac Y.
    Howard, Jason
    Hunt, Martin
    Jackman, Shaun D.
    Jaffe, David B.
    Jarvis, Erich D.
    Jiang, Huaiyang
    Kazakov, Sergey
    Kersey, Paul J.
    Kitzman, Jacob O.
    Knight, James R.
    Koren, Sergey
    Lam, Tak-Wah
    Lavenier, Dominique
    Laviolette, Francois
    Li, Yingrui
    Li, Zhenyu
    Liu, Binghang
    Liu, Yue
    Luo, Ruibang
    MacCallum, Iain
    [J]. GIGASCIENCE, 2013, 2
  • [4] De novo fragment assembly with short mate-paired reads: Does the read length matter?
    Chaisson, Mark J.
    Brinza, Dumitru
    Pevzner, Pavel A.
    [J]. GENOME RESEARCH, 2009, 19 (02) : 336 - 346
  • [5] Informed and automated k-mer size selection for genome assembly
    Chikhi, Rayan
    Medvedev, Paul
    [J]. BIOINFORMATICS, 2014, 30 (01) : 31 - 37
  • [6] SHARCGS, a fast and highly accurate short-read assembly algorithm for de novo genomic sequencing
    Dohm, Juliane C.
    Lottaz, Claudio
    Borodina, Tatiana
    Himmelbauer, Heinz
    [J]. GENOME RESEARCH, 2007, 17 (11) : 1697 - 1706
  • [7] High-quality draft assemblies of mammalian genomes from massively parallel sequence data
    Gnerre, Sante
    MacCallum, Iain
    Przybylski, Dariusz
    Ribeiro, Filipe J.
    Burton, Joshua N.
    Walker, Bruce J.
    Sharpe, Ted
    Hall, Giles
    Shea, Terrance P.
    Sykes, Sean
    Berlin, Aaron M.
    Aird, Daniel
    Costello, Maura
    Daza, Riza
    Williams, Louise
    Nicol, Robert
    Gnirke, Andreas
    Nusbaum, Chad
    Lander, Eric S.
    Jaffe, David B.
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2011, 108 (04) : 1513 - 1518
  • [8] QUAST: quality assessment tool for genome assemblies
    Gurevich, Alexey
    Saveliev, Vladislav
    Vyahhi, Nikolay
    Tesler, Glenn
    [J]. BIOINFORMATICS, 2013, 29 (08) : 1072 - 1075
  • [9] Extending assembly of short DNA sequences to handle error
    Jeck, William R.
    Reinhardt, Josephine A.
    Baltrus, David A.
    Hickenbotham, Matthew T.
    Magrini, Vincent
    Mardis, Elaine R.
    Dangl, Jeffery L.
    Jones, Corbin D.
    [J]. BIOINFORMATICS, 2007, 23 (21) : 2942 - 2944
  • [10] SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler
    Luo, Ruibang
    Liu, Binghang
    Xie, Yinlong
    Li, Zhenyu
    Huang, Weihua
    Yuan, Jianying
    He, Guangzhu
    Chen, Yanxiang
    Pan, Qi
    Liu, Yunjie
    Tang, Jingbo
    Wu, Gengxiong
    Zhang, Hao
    Shi, Yujian
    Liu, Yong
    Yu, Chang
    Wang, Bo
    Lu, Yao
    Han, Changlei
    Cheung, David W.
    Yiu, Siu-Ming
    Peng, Shaoliang
    Zhu Xiaoqian
    Liu, Guangming
    Liao, Xiangke
    Li, Yingrui
    Yang, Huanming
    Wang, Jian
    Lam, Tak-Wah
    Wang, Jun
    [J]. GIGASCIENCE, 2012, 1