Sequence assembly using next generation sequencing data-challenges and solutions

被引:14
作者
Chin, Francis Y. L. [1 ]
Leung, Henry C. M. [1 ]
Yiu, S. M. [1 ]
机构
[1] Univ Hong Kong, Dept Comp Sci, Hong Kong, Hong Kong, Peoples R China
关键词
genomic assembling; de Bruijn graph; paired-end reads; next generation sequencing; DE-NOVO ASSEMBLER; SHORT DNA-SEQUENCES; NUCLEOTIDE-SEQUENCE; IDBA; MILLIONS; READS;
D O I
10.1007/s11427-014-4752-9
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Sequence assembling is an important step for bioinformatics study. With the help of next generation sequencing (NGS) technology, high throughput DNA fragment (reads) can be randomly sampled from DNA or RNA molecular sequence. However, as the positions of reads being sampled are unknown, assembling process is required for combining overlapped reads to reconstruct the original DNA or RNA sequence. Compared with traditional Sanger sequencing methods, although the throughput of NGS reads increases, the read length is shorter and the error rate is higher. It introduces several problems in assembling. Moreover, paired-end reads instead of single-end reads can be sampled which contain more information. The existing assemblers cannot fully utilize this information and fails to assemble longer contigs. In this article, we will revisit the major problems of assembling NGS reads on genomic, transcriptomic, metagenomic and metatranscriptomic data. We will also describe our IDBA package for solving these problems. IDBA package has adopted several novel ideas in assembling, including using multiple k, local assembling and progressive depth removal. Compared with existence assemblers, IDBA has better performance on many simulated and real sequencing datasets.
引用
收藏
页码:1140 / 1148
页数:9
相关论文
共 27 条
[1]  
Burrows M, 1994, BLOCK SORTING LOSSLE
[2]   ALLPATHS: De novo assembly of whole-genome shotgun microreads [J].
Butler, Jonathan ;
MacCallum, Iain ;
Kleber, Michael ;
Shlyakhter, Ilya A. ;
Belmonte, Matthew K. ;
Lander, Eric S. ;
Nusbaum, Chad ;
Jaffe, David B. .
GENOME RESEARCH, 2008, 18 (05) :810-820
[3]   Short read fragment assembly of bacterial genomes [J].
Chaisson, Mark J. ;
Pevzner, Pavel A. .
GENOME RESEARCH, 2008, 18 (02) :324-330
[4]   De novo fragment assembly with short mate-paired reads: Does the read length matter? [J].
Chaisson, Mark J. ;
Brinza, Dumitru ;
Pevzner, Pavel A. .
GENOME RESEARCH, 2009, 19 (02) :336-346
[5]   SHARCGS, a fast and highly accurate short-read assembly algorithm for de novo genomic sequencing [J].
Dohm, Juliane C. ;
Lottaz, Claudio ;
Borodina, Tatiana ;
Himmelbauer, Heinz .
GENOME RESEARCH, 2007, 17 (11) :1697-1706
[6]   COMPLETE NUCLEOTIDE-SEQUENCE OF BACTERIOPHAGE MS2-RNA - PRIMARY AND SECONDARY STRUCTURE OF REPLICASE GENE [J].
FIERS, W ;
CONTRERAS, R ;
DUERINCK, F ;
HAEGEMAN, G ;
ISERENTANT, D ;
MERREGAERT, J ;
MINJOU, W ;
MOLEMANS, F ;
RAEYMAEKERS, A ;
VANDENBERGHE, A ;
VOLCKAERT, G ;
YSEBAERT, M .
NATURE, 1976, 260 (5551) :500-507
[7]   De novo bacterial genome sequencing: Millions of very short reads assembled on a desktop computer [J].
Hernandez, David ;
Francois, Patrice ;
Farinelli, Laurent ;
Osteras, Magne ;
Schrenzel, Jacques .
GENOME RESEARCH, 2008, 18 (05) :802-809
[8]  
HOLLEY RW, 1965, J BIOL CHEM, V240, P2122
[9]   STRUCTURE OF A RIBONUCLEIC ACID [J].
HOLLEY, RW ;
APGAR, J ;
EVERETT, GA ;
MADISON, JT ;
MARQUISEE, M ;
MERRILL, SH ;
PENSWICK, JR ;
ZAMIR, A .
SCIENCE, 1965, 147 (3664) :1462-+
[10]   Extending assembly of short DNA sequences to handle error [J].
Jeck, William R. ;
Reinhardt, Josephine A. ;
Baltrus, David A. ;
Hickenbotham, Matthew T. ;
Magrini, Vincent ;
Mardis, Elaine R. ;
Dangl, Jeffery L. ;
Jones, Corbin D. .
BIOINFORMATICS, 2007, 23 (21) :2942-2944