In vitro, long-range sequence information for de novo genome assembly via transposase contiguity

被引:127
作者
Adey, Andrew [1 ]
Kitzman, Jacob O. [1 ]
Burton, Joshua N. [1 ]
Daza, Riza [1 ]
Kumar, Akash [1 ]
Christiansen, Lena [2 ]
Ronaghi, Mostafa [2 ]
Amini, Sasan [2 ]
Gunderson, Kevin L. [2 ]
Steemers, Frank J. [2 ]
Shendure, Jay [1 ]
机构
[1] Univ Washington, Dept Genome Sci, Seattle, WA 98115 USA
[2] Illumina Inc, Adv Res Grp, San Diego, CA 92122 USA
基金
美国国家科学基金会;
关键词
LOW-INPUT; CONSTRUCTION; CHROMATIN; DATABASE; READS;
D O I
10.1101/gr.178319.114
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
We describe a method that exploits contiguity preserving transposase sequencing (CPT-seq) to facilitate the scaffolding of de novo genome assemblies. CPT-seq is an entirely in vitro means of generating libraries comprised of 9216 indexed pools, each of which contains thousands of sparsely sequenced long fragments ranging from 5 kilobases to >1 megabase. These pools are "subhaploid," in that the lengths of fragments contained in each pool sums to similar to 5% to 10% of the full genome. The scaffolding approach described here, termed fragScaff, leverages coincidences between the content of different pools as a source of contiguity information. Specifically, CPT-seq data is mapped to a de novo genome assembly, followed by the identification of pairs of contigs or scaffolds whose ends disproportionately co-occur in the same indexed pools, consistent with true adjacency in the genome. Such candidate "joins" are used to construct a graph, which is then resolved by a minimum spanning tree. As a proof-of-concept, we apply CPT-seq and fragScaff to substantially boost the contiguity of de novo assemblies of the human, mouse, and fly genomes, increasing the scaffold N50 of de novo assemblies by eight-to 57-fold with high accuracy. We also demonstrate that fragScaff is complementary to Hi-C-based contact probability maps, providing midrange contiguity to support robust, accurate chromosome-scale de novo genome assemblies without the need for laborious in vivo cloning steps. Finally, we demonstrate CPT-seq as a means of anchoring unplaced novel human contigs to the reference genome as well as for detecting misassembled sequences.
引用
收藏
页码:2041 / 2049
页数:9
相关论文
共 40 条
  • [1] The haplotype-resolved genome and epigenome of the aneuploid HeLa cancer cell line
    Adey, Andrew
    Burton, Joshua N.
    Kitzman, Jacob O.
    Hiatt, Joseph B.
    Lewis, Alexandra P.
    Martin, Beth K.
    Qiu, Ruolan
    Lee, Choli
    Shendure, Jay
    [J]. NATURE, 2013, 500 (7461) : 207 - +
  • [2] Ultra-low-input, tagmentation-based whole-genome bisulfite sequencing
    Adey, Andrew
    Shendure, Jay
    [J]. GENOME RESEARCH, 2012, 22 (06) : 1139 - 1143
  • [3] Rapid, low-input, low-bias construction of shotgun fragment libraries by high-density in vitro transposition
    Adey, Andrew
    Morrison, Hilary G.
    Asan
    Xun, Xu
    Kitzman, Jacob O.
    Turner, Emily H.
    Stackhouse, Bethany
    MacKenzie, Alexandra P.
    Caruccio, Nicholas C.
    Zhang, Xiuqing
    Shendure, Jay
    [J]. GENOME BIOLOGY, 2010, 11 (12)
  • [4] Haplotype-resolved whole-genome sequencing by contiguity-preserving transposition and combinatorial indexing
    Amini, Sasan
    Pushkarev, Dmitry
    Christiansen, Lena
    Kostem, Emrah
    Royce, Tom
    Turk, Casey
    Pignatelli, Natasha
    Adey, Andrew
    Kitzman, Jacob O.
    Vijayan, Kandaswamy
    Ronaghi, Mostafa
    Shendure, Jay
    Gunderson, Kevin L.
    Steemers, Frank J.
    [J]. NATURE GENETICS, 2014, 46 (12) : 1343 - 1349
  • [5] Draft Genome Sequence of the Grapevine Dieback Fungus Eutypa lata UCR-EL1
    Blanco-Ulate, Barbara
    Rolshausen, Philippe E.
    Cantu, Dario
    [J]. GENOME ANNOUNCEMENTS, 2013, 1 (03)
  • [6] Assemblathon 2: evaluating de novo methods of genome assembly in three vertebrate species
    Bradnam, Keith R.
    Fass, Joseph N.
    Alexandrov, Anton
    Baranay, Paul
    Bechner, Michael
    Birol, Inanc
    Boisvert, Sebastien
    Chapman, Jarrod A.
    Chapuis, Guillaume
    Chikhi, Rayan
    Chitsaz, Hamidreza
    Chou, Wen-Chi
    Corbeil, Jacques
    Del Fabbro, Cristian
    Docking, T. Roderick
    Durbin, Richard
    Earl, Dent
    Emrich, Scott
    Fedotov, Pavel
    Fonseca, Nuno A.
    Ganapathy, Ganeshkumar
    Gibbs, Richard A.
    Gnerre, Sante
    Godzaridis, Elenie
    Goldstein, Steve
    Haimel, Matthias
    Hall, Giles
    Haussler, David
    Hiatt, Joseph B.
    Ho, Isaac Y.
    Howard, Jason
    Hunt, Martin
    Jackman, Shaun D.
    Jaffe, David B.
    Jarvis, Erich D.
    Jiang, Huaiyang
    Kazakov, Sergey
    Kersey, Paul J.
    Kitzman, Jacob O.
    Knight, James R.
    Koren, Sergey
    Lam, Tak-Wah
    Lavenier, Dominique
    Laviolette, Francois
    Li, Yingrui
    Li, Zhenyu
    Liu, Binghang
    Liu, Yue
    Luo, Ruibang
    MacCallum, Iain
    [J]. GIGASCIENCE, 2013, 2
  • [7] Buenrostro JD, 2013, NAT METHODS, V10, P1213, DOI [10.1038/nmeth.2688, 10.1038/NMETH.2688]
  • [8] Species-Level Deconvolution of Metagenome Assemblies with Hi-C-Based Contact Probability Maps
    Burton, Joshua N.
    Liachko, Ivan
    Dunham, Maitreya J.
    Shendure, Jay
    [J]. G3-GENES GENOMES GENETICS, 2014, 4 (07): : 1339 - 1346
  • [9] Chromosome-scale scaffolding of de novo genome assemblies based on chromatin interactions
    Burton, Joshua N.
    Adey, Andrew
    Patwardhan, Rupali P.
    Qiu, Ruolan
    Kitzman, Jacob O.
    Shendure, Jay
    [J]. NATURE BIOTECHNOLOGY, 2013, 31 (12) : 1119 - +
  • [10] Finishing the euchromatic sequence of the human genome
    Collins, FS
    Lander, ES
    Rogers, J
    Waterston, RH
    [J]. NATURE, 2004, 431 (7011) : 931 - 945