Next-Generation Sequence Assembly: Four Stages of Data Processing and Computational Challenges

被引:79
作者
El-Metwally, Sara [1 ]
Hamza, Taher [1 ]
Zakaria, Magdi [1 ]
Helmy, Mohamed [2 ,3 ]
机构
[1] Mansoura Univ, Dept Comp Sci, Fac Comp & Informat, Mansoura, Egypt
[2] Al Azhar Univ, Dept Bot, Fac Agr, Cairo, Egypt
[3] Al Azhar Univ, Fac Agr, Dept Biotechnol, Cairo, Egypt
关键词
READ ERROR-CORRECTION; SHORT DNA-SEQUENCES; DE-BRUIJN GRAPHS; GENOME SEQUENCE; STRING GRAPH; PAIRED READS; ALGORITHM; TECHNOLOGIES; VELVET; PLATFORMS;
D O I
10.1371/journal.pcbi.1003345
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Decoding DNA symbols using next-generation sequencers was a major breakthrough in genomic research. Despite the many advantages of next-generation sequencers, e.g., the high-throughput sequencing rate and relatively low cost of sequencing, the assembly of the reads produced by these sequencers still remains a major challenge. In this review, we address the basic framework of next-generation genome sequence assemblers, which comprises four basic stages: preprocessing filtering, a graph construction process, a graph simplification process, and postprocessing filtering. Here we discuss them as a framework of four stages for data analysis and processing and survey variety of techniques, algorithms, and software tools used during each stage. We also discuss the challenges that face current assemblers in the next-generation environment to determine the current state-of-the-art. We recommend a layered architecture approach for constructing a general assembler that can handle the sequences generated by different sequencing platforms.
引用
收藏
页数:19
相关论文
共 107 条
[1]   Limitations of next-generation genome sequence assembly [J].
Alkan, Can ;
Sajjadian, Saba ;
Eichler, Evan E. .
NATURE METHODS, 2011, 8 (01) :61-65
[2]   A new approach to sequence comparison:: normalired sequence alignment [J].
Arslan, AN ;
Egecioglu, Ö ;
Pevzner, PA .
BIOINFORMATICS, 2001, 17 (04) :327-337
[3]   High quality draft sequences for prokaryotic genomes using a mix of new sequencing technologies [J].
Aury, Jean-Marc ;
Cruaud, Corinne ;
Barbe, Valerie ;
Rogier, Odile ;
Mangenot, Sophie ;
Samson, Gaelle ;
Poulain, Julie ;
Anthouard, Veronique ;
Scarpelli, Claude ;
Artiguenave, Francois ;
Wincker, Patrick .
BMC GENOMICS, 2008, 9 (1)
[4]   Scaffolding pre-assembled contigs using SSPACE [J].
Boetzer, Marten ;
Henkel, Christiaan V. ;
Jansen, Hans J. ;
Butler, Derek ;
Pirovano, Walter .
BIOINFORMATICS, 2011, 27 (04) :578-579
[5]  
Bowe Alexander, 2012, Algorithms in Bioinformatics. Proceedings of the12th International Workshop, WABI 2012, P225, DOI 10.1007/978-3-642-33122-0_18
[6]   Assemblathon 2: evaluating de novo methods of genome assembly in three vertebrate species [J].
Bradnam, Keith R. ;
Fass, Joseph N. ;
Alexandrov, Anton ;
Baranay, Paul ;
Bechner, Michael ;
Birol, Inanc ;
Boisvert, Sebastien ;
Chapman, Jarrod A. ;
Chapuis, Guillaume ;
Chikhi, Rayan ;
Chitsaz, Hamidreza ;
Chou, Wen-Chi ;
Corbeil, Jacques ;
Del Fabbro, Cristian ;
Docking, T. Roderick ;
Durbin, Richard ;
Earl, Dent ;
Emrich, Scott ;
Fedotov, Pavel ;
Fonseca, Nuno A. ;
Ganapathy, Ganeshkumar ;
Gibbs, Richard A. ;
Gnerre, Sante ;
Godzaridis, Elenie ;
Goldstein, Steve ;
Haimel, Matthias ;
Hall, Giles ;
Haussler, David ;
Hiatt, Joseph B. ;
Ho, Isaac Y. ;
Howard, Jason ;
Hunt, Martin ;
Jackman, Shaun D. ;
Jaffe, David B. ;
Jarvis, Erich D. ;
Jiang, Huaiyang ;
Kazakov, Sergey ;
Kersey, Paul J. ;
Kitzman, Jacob O. ;
Knight, James R. ;
Koren, Sergey ;
Lam, Tak-Wah ;
Lavenier, Dominique ;
Laviolette, Francois ;
Li, Yingrui ;
Li, Zhenyu ;
Liu, Binghang ;
Liu, Yue ;
Luo, Ruibang ;
MacCallum, Iain .
GIGASCIENCE, 2013, 2
[7]   QSRA - a quality-value guided de novo short read assembler [J].
Bryant, Douglas W., Jr. ;
Wong, Weng-Keen ;
Mockler, Todd C. .
BMC BIOINFORMATICS, 2009, 10
[8]   ALLPATHS: De novo assembly of whole-genome shotgun microreads [J].
Butler, Jonathan ;
MacCallum, Iain ;
Kleber, Michael ;
Shlyakhter, Ilya A. ;
Belmonte, Matthew K. ;
Lander, Eric S. ;
Nusbaum, Chad ;
Jaffe, David B. .
GENOME RESEARCH, 2008, 18 (05) :810-820
[9]   Rapid hybrid de novo assembly of a microbial genome using only short reads: Corynebacterium pseudotuberculosis I19 as a case study [J].
Cerdeira, Louise Teixeira ;
Carneiro, Adriana Ribeiro ;
Juca Ramos, Rommel Thiago ;
de Almeida, Sintia Silva ;
D'Afonseca, Vivian ;
Cruz Schneider, Maria Paula ;
Baumbach, Jan ;
Tauch, Andreas ;
McCulloch, John Anthony ;
Carvalho Azevedo, Vasco Ariston ;
Silva, Artur .
JOURNAL OF MICROBIOLOGICAL METHODS, 2011, 86 (02) :218-223
[10]   Fragment assembly with short reads [J].
Chaisson, M ;
Pevzner, P ;
Tang, HX .
BIOINFORMATICS, 2004, 20 (13) :2067-2074