NOVOPlasty: de novo assembly of organelle genomes from whole genome data

被引:2933
作者
Dierckxsens, Nicolas [1 ,2 ]
Mardulyn, Patrick [1 ,2 ,3 ]
Smits, Guillaume [1 ,2 ,4 ,5 ]
机构
[1] Univ Libre Bruxelles, Interuniv Inst Bioinformat Brussels, Triomflaan CP 263, B-1050 Brussels, Belgium
[2] Vrije Univ Brussel, Triomflaan CP 263, B-1050 Brussels, Belgium
[3] Univ Libre Bruxelles, Fac Sci, Evolutionary Biol & Ecol Unit, CP 160-12,Av FD Roosevelt 50, B-1050 Brussels, Belgium
[4] Univ Libre Bruxelles, Hop Univ Enfants Reine Fabiola, Genet, Brussels, Belgium
[5] Univ Libre Bruxelles, Ctr Med Genet, Hop Erasme, Route Lennik 808, B-1070 Brussels, Belgium
关键词
SHORT DNA-SEQUENCES; CHLOROPLAST GENOME;
D O I
10.1093/nar/gkw955
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
The evolution in next-generation sequencing (NGS) technology has led to the development of many different assembly algorithms, but few of them focus on assembling the organelle genomes. These genomes are used in phylogenetic studies, food identification and are the most deposited eukaryotic genomes in GenBank. Producing organelle genome assembly from whole genome sequencing (WGS) data would be the most accurate and least laborious approach, but a tool specifically designed for this task is lacking. We developed a seed-and-extend algorithm that assembles organelle genomes from whole genome sequencing (WGS) data, starting from a related or distant single seed sequence. The algorithm has been tested on several new (Gonioctena interme-dia and Avicennia marina) and public (Arabidop-sis thaliana and Oryza sativa) whole genome Illumina data sets where it outperforms known assemblers in assembly accuracy and coverage. In our benchmark, NOVOPlasty assembled all tested circular genomes in less than 30 min with a maximum memory requirement of 16 GB and an accuracy over 99.99%. In conclusion, NOVOPlasty is the sole de novo assembler that provides a fast and straightforward extraction of the extranuclear genomes from WGS data in one circular high quality contig. The software is open source and can be downloaded at https://github.com/ndierckx/NOVOPlasty.
引用
收藏
页数:9
相关论文
共 24 条
[1]  
Ahmed F., 2015, J. Next General. Sequenc. Applic, V2, P2, DOI [DOI 10.4172/2469-9853.1000119, 10.4172/2469-9853.1000119]
[2]   Insights into the Evolution of Mitochondrial Genome Size from Complete Sequences of Citrullus lanatus and Cucurbita pepo (Cucurbitaceae) [J].
Alverson, Andrew J. ;
Wei, XiaoXin ;
Rice, Danny W. ;
Stern, David B. ;
Barry, Kerrie ;
Palmer, Jeffrey D. .
MOLECULAR BIOLOGY AND EVOLUTION, 2010, 27 (06) :1436-1448
[3]  
Benson DA, 2010, NUCLEIC ACIDS RES, V38, pD46, DOI [10.1093/nar/gkp1024, 10.1093/nar/gkx1094, 10.1093/nar/gkl986, 10.1093/nar/gkw1070, 10.1093/nar/gks1195, 10.1093/nar/gkn723, 10.1093/nar/gkg057, 10.1093/nar/gkr1202, 10.1093/nar/gkq1079]
[4]  
Bignell G R, 1996, Methods Mol Biol, V53, P109
[5]   Direct Chloroplast Sequencing: Comparison of Sequencing Platforms and Analysis Tools for Whole Chloroplast Barcoding [J].
Brozynska, Marta ;
Furtado, Agnelo ;
Henry, Robert James .
PLOS ONE, 2014, 9 (10)
[6]  
Chevreux B., 1999, Proceedings of the German Conference on Bioinformatics (GCB), V99, P45
[7]   The NCBI Taxonomy database [J].
Federhen, Scott .
NUCLEIC ACIDS RESEARCH, 2012, 40 (D1) :D136-D143
[8]   Reconstructing mitochondrial genomes directly from genomic next-generation sequencing reads-a baiting and iterative mapping approach [J].
Hahn, Christoph ;
Bachmann, Lutz ;
Chevreux, Bastien .
NUCLEIC ACIDS RESEARCH, 2013, 41 (13) :e129
[9]   Methods for obtaining and analyzing whole chloroplast genome sequences [J].
Jansen, RK ;
Raubeson, LA ;
Boore, JL ;
DePamphilis, CW ;
Chumley, TW ;
Haberle, RC ;
Wyman, SK ;
Alverson, AJ ;
Peery, R ;
Herman, SJ ;
Fourcade, HM ;
Kuehl, JV ;
McNeal, JR ;
Leebens-Mack, J ;
Cui, L .
MOLECULAR EVOLUTION: PRODUCING THE BIOCHEMICAL DATA, PART B, 2005, 395 :348-384
[10]   Extending assembly of short DNA sequences to handle error [J].
Jeck, William R. ;
Reinhardt, Josephine A. ;
Baltrus, David A. ;
Hickenbotham, Matthew T. ;
Magrini, Vincent ;
Mardis, Elaine R. ;
Dangl, Jeffery L. ;
Jones, Corbin D. .
BIOINFORMATICS, 2007, 23 (21) :2942-2944