Sequence assembly demystified

被引:283
作者
Nagarajan, Niranjan [1 ]
Pop, Mihai [2 ]
机构
[1] Genome Inst Singapore, Singapore 138672, Singapore
[2] Univ Maryland, Dept Comp Sci, College Pk, MD 20742 USA
基金
美国国家科学基金会;
关键词
DE-BRUIJN GRAPHS; RNA-SEQ DATA; QUASI-SPECIES RECONSTRUCTION; GENOME ASSEMBLIES; STRUCTURAL VARIATION; SINGLE-CELL; SHORT READS; BACTERIAL GENOMES; DRAFT ASSEMBLIES; RESTRICTION MAPS;
D O I
10.1038/nrg3367
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
Advances in sequencing technologies and increased access to sequencing services have led to renewed interest in sequence and genome assembly. Concurrently, new applications for sequencing have emerged, including gene expression analysis, discovery of genomic variants and metagenomics, and each of these has different needs and challenges in terms of assembly. We survey the theoretical foundations that underlie modern assembly and highlight the options and practical trade-offs that need to be considered, focusing on how individual features address the needs of specific applications. We also review key software and the interplay between experimental design and efficacy of assembly.
引用
收藏
页码:157 / 167
页数:11
相关论文
共 96 条
  • [21] Assemblathon 1: A competitive assessment of de novo short read assembly methods
    Earl, Dent
    Bradnam, Keith
    St John, John
    Darling, Aaron
    Lin, Dawei
    Fass, Joseph
    Hung On Ken Yu
    Buffalo, Vince
    Zerbino, Daniel R.
    Diekhans, Mark
    Ngan Nguyen
    Ariyaratne, Pramila Nuwantha
    Sung, Wing-Kin
    Ning, Zemin
    Haimel, Matthias
    Simpson, Jared T.
    Fonseca, Nuno A.
    Birol, Inanc
    Docking, T. Roderick
    Ho, Isaac Y.
    Rokhsar, Daniel S.
    Chikhi, Rayan
    Lavenier, Dominique
    Chapuis, Guillaume
    Naquin, Delphine
    Maillet, Nicolas
    Schatz, Michael C.
    Kelley, David R.
    Phillippy, Adam M.
    Koren, Sergey
    Yang, Shiaw-Pyng
    Wu, Wei
    Chou, Wen-Chi
    Srivastava, Anuj
    Shaw, Timothy I.
    Ruby, J. Graham
    Skewes-Cox, Peter
    Betegon, Miguel
    Dimon, Michelle T.
    Solovyev, Victor
    Seledtsov, Igor
    Kosarev, Petr
    Vorobyev, Denis
    Ramirez-Gonzalez, Ricardo
    Leggett, Richard
    MacLean, Dan
    Xia, Fangfang
    Luo, Ruibang
    Li, Zhenyu
    Xie, Yinlong
    [J]. GENOME RESEARCH, 2011, 21 (12) : 2224 - 2241
  • [22] Real-Time DNA Sequencing from Single Polymerase Molecules
    Eid, John
    Fehr, Adrian
    Gray, Jeremy
    Luong, Khai
    Lyle, John
    Otto, Geoff
    Peluso, Paul
    Rank, David
    Baybayan, Primo
    Bettman, Brad
    Bibillo, Arkadiusz
    Bjornson, Keith
    Chaudhuri, Bidhan
    Christians, Frederick
    Cicero, Ronald
    Clark, Sonya
    Dalal, Ravindra
    deWinter, Alex
    Dixon, John
    Foquet, Mathieu
    Gaertner, Alfred
    Hardenbol, Paul
    Heiner, Cheryl
    Hester, Kevin
    Holden, David
    Kearns, Gregory
    Kong, Xiangxu
    Kuse, Ronald
    Lacroix, Yves
    Lin, Steven
    Lundquist, Paul
    Ma, Congcong
    Marks, Patrick
    Maxham, Mark
    Murphy, Devon
    Park, Insil
    Pham, Thang
    Phillips, Michael
    Roy, Joy
    Sebra, Robert
    Shen, Gene
    Sorenson, Jon
    Tomaney, Austin
    Travers, Kevin
    Trulson, Mark
    Vieceli, John
    Wegener, Jeffrey
    Wu, Dawn
    Yang, Alicia
    Zaccarin, Denis
    [J]. SCIENCE, 2009, 323 (5910) : 133 - 138
  • [23] Viral population estimation using pyrosequencing
    Eriksson, Nicholas
    Pachter, Lior
    Mitsuya, Yumi
    Rhee, Soo-Yon
    Wang, Chunlin
    Gharizadeh, Baback
    Ronaghi, Mostafa
    Shafer, Robert W.
    Beerenwinkel, Niko
    [J]. PLOS COMPUTATIONAL BIOLOGY, 2008, 4 (05)
  • [24] Short read Illumina data for the de novo assembly of a non-model snail species transcriptome (Radix balthica, Basommatophora, Pulmonata), and a comparison of assembler performance
    Feldmeyer, Barbara
    Wheat, Christopher W.
    Krezdorn, Nicolas
    Rotter, Bjoern
    Pfenninger, Markus
    [J]. BMC GENOMICS, 2011, 12
  • [25] Opportunistic data structures with applications
    Ferragina, P
    Manzini, G
    [J]. 41ST ANNUAL SYMPOSIUM ON FOUNDATIONS OF COMPUTER SCIENCE, PROCEEDINGS, 2000, : 390 - 398
  • [26] Next-generation DNA sequencing of paired-end tags (PET) for transcriptome and genome analyses
    Fullwood, Melissa J.
    Wei, Chia-Lin
    Liu, Edison T.
    Ruan, Yijun
    [J]. GENOME RESEARCH, 2009, 19 (04) : 521 - 532
  • [27] Opera: Reconstructing Optimal Genomic Scaffolds with High-Throughput Paired-End Sequences
    Gao, Song
    Sung, Wing-Kin
    Nagarajan, Niranjan
    [J]. JOURNAL OF COMPUTATIONAL BIOLOGY, 2011, 18 (11) : 1681 - 1691
  • [28] High-quality draft assemblies of mammalian genomes from massively parallel sequence data
    Gnerre, Sante
    MacCallum, Iain
    Przybylski, Dariusz
    Ribeiro, Filipe J.
    Burton, Joshua N.
    Walker, Bruce J.
    Sharpe, Ted
    Hall, Giles
    Shea, Terrance P.
    Sykes, Sean
    Berlin, Aaron M.
    Aird, Daniel
    Costello, Maura
    Daza, Riza
    Williams, Louise
    Nicol, Robert
    Gnirke, Andreas
    Nusbaum, Chad
    Lander, Eric S.
    Jaffe, David B.
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2011, 108 (04) : 1513 - 1518
  • [29] Assisted assembly: how to improve a de novo genome assembly by using related species
    Gnerre, Sante
    Lander, Eric S.
    Lindblad-Toh, Kerstin
    Jaffe, David B.
    [J]. GENOME BIOLOGY, 2009, 10 (08):
  • [30] Full-length transcriptome assembly from RNA-Seq data without a reference genome
    Grabherr, Manfred G.
    Haas, Brian J.
    Yassour, Moran
    Levin, Joshua Z.
    Thompson, Dawn A.
    Amit, Ido
    Adiconis, Xian
    Fan, Lin
    Raychowdhury, Raktima
    Zeng, Qiandong
    Chen, Zehua
    Mauceli, Evan
    Hacohen, Nir
    Gnirke, Andreas
    Rhind, Nicholas
    di Palma, Federica
    Birren, Bruce W.
    Nusbaum, Chad
    Lindblad-Toh, Kerstin
    Friedman, Nir
    Regev, Aviv
    [J]. NATURE BIOTECHNOLOGY, 2011, 29 (07) : 644 - U130