Identifying the causes and consequences of assembly gaps using a multiplatform genome assembly of a bird-of-paradise

被引:97
作者
Peona, Valentina [1 ,2 ]
Blom, Mozes P. K. [3 ,4 ]
Xu, Luohao [5 ]
Burri, Reto [6 ]
Sullivan, Shawn [7 ]
Bunikis, Ignas [8 ]
Liachko, Ivan [7 ]
Haryoko, Tri [9 ]
Jonsson, Knud A. [10 ]
Zhou, Qi [5 ,11 ,12 ]
Irestedt, Martin [3 ]
Suh, Alexander [1 ,2 ,13 ]
机构
[1] Uppsala Univ, Dept Ecol & Genet Evolutionary Biol, Sci Life Labs, Uppsala, Sweden
[2] Uppsala Univ, Dept Organismal Biol Systemat Biol, Sci Life Labs, Uppsala, Sweden
[3] Swedish Museum Nat Hist, Dept Bioinformat & Genet, Stockholm, Sweden
[4] Leibniz Inst Evolut & Biodiversitatsforsch Berlin, Museum Nat Kunde, Berlin, Germany
[5] Univ Vienna, Dept Neurosci & Dev Biol, Vienna, Austria
[6] Friedrich Schiller Univ Jena, Inst Ecol & Evolut, Dept Populat Ecol, Jena, Germany
[7] Phase Genom, Seattle, WA USA
[8] Uppsala Univ, Uppsala Genome Ctr, Dept Immunol Genet & Pathol, Sci Life Lab, Uppsala, Sweden
[9] Indonesian Inst Sci UPI, Res Ctr Biol, Museum Zool Bogoriense, Cibinong, Indonesia
[10] Univ Copenhagen, Nat Hist Museum Denmark, Copenhagen, Denmark
[11] Zhejiang Univ, Life Sci Inst, MOE Lab Biosyst Homeostasis & Protect, Hangzhou, Peoples R China
[12] Zhejiang Univ, Affiliated Hosp 2, Ctr Reprod Med, Sch Med, Hangzhou, Peoples R China
[13] Univ East Anglia, Sch Biol Sci Organisms & Environm, Norwich, Norfolk, England
基金
瑞典研究理事会;
关键词
chromosome-level assembly; GC content; genome assembly; Hi-C; long reads; satellite repeat; transposable element; TRANSPOSABLE ELEMENTS; LIBRARY PREPARATION; HIDDEN GENES; LONG-READ; IN-VITRO; G4; DNA; NOVO; ANNOTATION; EVOLUTION; RNA;
D O I
10.1111/1755-0998.13252
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Genome assemblies are currently being produced at an impressive rate by consortia and individual laboratories. The low costs and increasing efficiency of sequencing technologies now enable assembling genomes at unprecedented quality and contiguity. However, the difficulty in assembling repeat-rich and GC-rich regions (genomic "dark matter") limits insights into the evolution of genome structure and regulatory networks. Here, we compare the efficiency of currently available sequencing technologies (short/linked/long reads and proximity ligation maps) and combinations thereof in assembling genomic dark matter. By adopting different de novo assembly strategies, we compare individual draft assemblies to a curated multiplatform reference assembly and identify the genomic features that cause gaps within each assembly. We show that a multiplatform assembly implementing long-read, linked-read and proximity sequencing technologies performs best at recovering transposable elements, multicopy MHC genes, GC-rich microchromosomes and the repeat-rich W chromosome. Telomere-to-telomere assemblies are not a reality yet for most organisms, but by leveraging technology choice it is now possible to minimize genome assembly gaps for downstream analysis. We provide a roadmap to tailor sequencing projects for optimized completeness of both the coding and noncoding parts of nonmodel genomes.
引用
收藏
页码:263 / 286
页数:24
相关论文
共 150 条
[1]   In vitro, long-range sequence information for de novo genome assembly via transposase contiguity [J].
Adey, Andrew ;
Kitzman, Jacob O. ;
Burton, Joshua N. ;
Daza, Riza ;
Kumar, Akash ;
Christiansen, Lena ;
Ronaghi, Mostafa ;
Amini, Sasan ;
Gunderson, Kevin L. ;
Steemers, Frank J. ;
Shendure, Jay .
GENOME RESEARCH, 2014, 24 (12) :2041-2049
[2]   Analyzing and minimizing PCR amplification bias in Illumina sequencing libraries [J].
Aird, Daniel ;
Ross, Michael G. ;
Chen, Wei-Sheng ;
Danielsson, Maxwell ;
Fennell, Timothy ;
Russ, Carsten ;
Jaffe, David B. ;
Nusbaum, Chad ;
Gnirke, Andreas .
GENOME BIOLOGY, 2011, 12 (02)
[3]   Limitations of next-generation genome sequence assembly [J].
Alkan, Can ;
Sajjadian, Saba ;
Eichler, Evan E. .
NATURE METHODS, 2011, 8 (01) :61-65
[4]  
ALTSCHUL SF, 1990, J MOL BIOL, V215, P403, DOI 10.1006/jmbi.1990.9999
[5]   Repbase Update, a database of repetitive elements in eukaryotic genomes [J].
Bao, Weidong ;
Kojima, Kenji K. ;
Kohany, Oleksiy .
MOBILE DNA, 2015, 6
[6]   Avian W and mammalian Y chromosomes convergently retained dosage-sensitive regulators [J].
Bellott, Daniel W. ;
Skaletsky, Helen ;
Cho, Ting -Jan ;
Brown, Laura ;
Locke, Devin ;
Chen, Nancy ;
Galkina, Svetlana ;
Pyntikova, Tatyana ;
Koutseva, Natalia ;
Graves, Tina ;
Kremitzki, Colin ;
Warren, Wesley C. ;
Clark, Andrew G. ;
Gaginskaya, Elena ;
Wilson, Richard K. ;
Page, David C. .
NATURE GENETICS, 2017, 49 (03) :387-394
[7]   Chromosome-scale assemblies of plant genomes using nanopore long reads and optical maps [J].
Belser, Caroline ;
Istace, Benjamin ;
Denis, Erwan ;
Dubarry, Marion ;
Baurens, Franc-Christophe ;
Falentin, Cyril ;
Genete, Mathieu ;
Berrabah, Wahiba ;
Chevre, Anne-Marie ;
Delourme, Regine ;
Deniot, Gwenaelle ;
Denoeud, France ;
Duffe, Philippe ;
Engelen, Stefan ;
Lemainque, Arnaud ;
Manzanares-Dauleux, Maria ;
Martin, Guillaume ;
Morice, Jerome ;
Noel, Benjamin ;
Vekemans, Xavier ;
D'Hont, Angelique ;
Rousseau-Gueutin, Mathieu ;
Barbe, Valerie ;
Cruaud, Corinne ;
Wincker, Patrick ;
Aury, Jean-Marc .
NATURE PLANTS, 2018, 4 (11) :879-+
[8]   Single-molecule sequencing and chromatin conformation capture enable de novo reference assembly of the domestic goat genome [J].
Bickhart, Derek M. ;
Rosen, Benjamin D. ;
Koren, Sergey ;
Sayre, Brian L. ;
Hastie, Alex R. ;
Chan, Saki ;
Lee, Joyce ;
Lam, Ernest T. ;
Liachko, Ivan ;
Sullivan, Shawn T. ;
Burton, Joshua N. ;
Huson, Heather J. ;
Nystrom, John C. ;
Kelley, Christy M. ;
Hutchison, Jana L. ;
Zhou, Yang ;
Sun, Jiajie ;
Crisa, Alessandra ;
de Leon, F. Abel Ponce ;
Schwartz, John C. ;
Hammond, John A. ;
Waldbieser, Geoffrey C. ;
Schroeder, Steven G. ;
Liu, George E. ;
Dunham, Maitreya J. ;
Shendure, Jay ;
Sonstegard, Tad S. ;
Phillippy, Adam M. ;
Van Tassell, Curtis P. ;
Smith, Timothy P. L. .
NATURE GENETICS, 2017, 49 (04) :643-+
[9]   An Intramolecular G-Quadruplex Structure Is Required for Binding of Telomeric Repeat-Containing RNA to the Telomeric Protein TRF2 [J].
Biffi, Giulia ;
Tannahill, David ;
Balasubramanian, Shankar .
JOURNAL OF THE AMERICAN CHEMICAL SOCIETY, 2012, 134 (29) :11974-11976
[10]   The Genome of Blue-Capped Cordon-Bleu Uncovers Hidden Diversity of LTR Retrotransposons in Zebra Finch [J].
Boman, Jesper ;
Frankl-Vilches, Carolina ;
dos Santos, Michelly da Silva ;
de Oliveira, Edivaldo H. C. ;
Gahr, Manfred ;
Suh, Alexander .
GENES, 2019, 10 (04)