Contiguous and accurate de novo assembly of metazoan genomes with modest long read coverage

被引:345
作者
Chakraborty, Mahul [1 ]
Baldwin-Brown, James G. [1 ]
Long, Anthony D. [1 ,2 ]
Emerson, J. J. [1 ,2 ]
机构
[1] Univ Calif Irvine, Dept Ecol & Evolutionary Biol, Irvine, CA 92697 USA
[2] Univ Calif Irvine, Ctr Complex Biol Syst, Irvine, CA 92697 USA
关键词
MOLECULE SEQUENCING READS; HYBRID ERROR-CORRECTION; MICROBIAL GENOMES; DROSOPHILA; QUALITY;
D O I
10.1093/nar/gkw654
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Genome assemblies that are accurate, complete and contiguous are essential for identifying important structural and functional elements of genomes and for identifying genetic variation. Nevertheless, most recent genome assemblies remain incomplete and fragmented. While long molecule sequencing promises to deliver more complete genome assemblies with fewer gaps, concerns about error rates, low yields, stringent DNA requirements and uncertainty about best practices may discourage many investigators from adopting this technology. Here, in conjunction with the platinum standard Drosophila melanogaster reference genome, we analyze recently published long molecule sequencing data to identify what governs completeness and contiguity of genome assemblies. We also present a hybrid meta-assembly approach that achieves remarkable assembly contiguity for both Drosophila and human assemblies with only modest long molecule sequencing coverage. Our results motivate a set of preliminary best practices for obtaining accurate and contiguous assemblies, a 'missing manual' that guides key decisions in building high quality de novo genome assemblies, from DNA isolation to polishing the assembly.
引用
收藏
页数:12
相关论文
共 36 条
[1]   APPLICATIONS OF NEXT-GENERATION SEQUENCING Genome structural variation discovery and genotyping [J].
Alkan, Can ;
Coe, Bradley P. ;
Eichler, Evan E. .
NATURE REVIEWS GENETICS, 2011, 12 (05) :363-375
[2]   Improving PacBio Long Read Accuracy by Short Read Alignment [J].
Au, Kin Fai ;
Underwood, Jason G. ;
Lee, Lawrence ;
Wong, Wing Hung .
PLOS ONE, 2012, 7 (10)
[3]   De novo genome assembly: what every biologist should know [J].
Baker, Monya .
NATURE METHODS, 2012, 9 (04) :333-337
[4]   Assembling large genomes with single-molecule sequencing and locality-sensitive hashing [J].
Berlin, Konstantin ;
Koren, Sergey ;
Chin, Chen-Shan ;
Drake, James P. ;
Landolin, Jane M. ;
Phillippy, Adam M. .
NATURE BIOTECHNOLOGY, 2015, 33 (06) :623-+
[5]   Assemblathon 2: evaluating de novo methods of genome assembly in three vertebrate species [J].
Bradnam, Keith R. ;
Fass, Joseph N. ;
Alexandrov, Anton ;
Baranay, Paul ;
Bechner, Michael ;
Birol, Inanc ;
Boisvert, Sebastien ;
Chapman, Jarrod A. ;
Chapuis, Guillaume ;
Chikhi, Rayan ;
Chitsaz, Hamidreza ;
Chou, Wen-Chi ;
Corbeil, Jacques ;
Del Fabbro, Cristian ;
Docking, T. Roderick ;
Durbin, Richard ;
Earl, Dent ;
Emrich, Scott ;
Fedotov, Pavel ;
Fonseca, Nuno A. ;
Ganapathy, Ganeshkumar ;
Gibbs, Richard A. ;
Gnerre, Sante ;
Godzaridis, Elenie ;
Goldstein, Steve ;
Haimel, Matthias ;
Hall, Giles ;
Haussler, David ;
Hiatt, Joseph B. ;
Ho, Isaac Y. ;
Howard, Jason ;
Hunt, Martin ;
Jackman, Shaun D. ;
Jaffe, David B. ;
Jarvis, Erich D. ;
Jiang, Huaiyang ;
Kazakov, Sergey ;
Kersey, Paul J. ;
Kitzman, Jacob O. ;
Knight, James R. ;
Koren, Sergey ;
Lam, Tak-Wah ;
Lavenier, Dominique ;
Laviolette, Francois ;
Li, Yingrui ;
Li, Zhenyu ;
Liu, Binghang ;
Liu, Yue ;
Luo, Ruibang ;
MacCallum, Iain .
GIGASCIENCE, 2013, 2
[6]   Optimal assembly for high throughput shotgun sequencing [J].
Guy Bresler ;
Ma'ayan Bresler ;
David Tse .
BMC Bioinformatics, 14 (Suppl 5)
[7]   Mapping single molecule sequencing reads using basic local alignment with successive refinement (BLASR): application and theory [J].
Chaisson, Mark J. ;
Tesler, Glenn .
BMC BIOINFORMATICS, 2012, 13
[8]  
Chin CS, 2013, NAT METHODS, V10, P563, DOI [10.1038/nmeth.2474, 10.1038/NMETH.2474]
[9]   THE ACCURACY OF DNA-SEQUENCES - ESTIMATING SEQUENCE QUALITY [J].
CHURCHILL, GA ;
WATERMAN, MS .
GENOMICS, 1992, 14 (01) :89-98
[10]   Oxford Nanopore sequencing, hybrid error correction, and de novo assembly of a eukaryotic genome [J].
Goodwin, Sara ;
Gurtowski, James ;
Ethe-Sayers, Scott ;
Deshpande, Panchajanya ;
Schatz, Michael C. ;
McCombie, W. Richard .
GENOME RESEARCH, 2015, 25 (11) :1750-1756