Improved assembly and variant detection of a haploid human genome using single-molecule, high-fidelity long reads

被引:80
作者
Vollger, Mitchell R. [1 ]
Logsdon, Glennis A. [1 ]
Audano, Peter A. [1 ]
Sulovari, Arvis [1 ]
Porubsky, David [1 ]
Peluso, Paul [2 ]
Wenger, Aaron M. [2 ]
Concepcion, Gregory T. [2 ]
Kronenberg, Zev N. [2 ]
Munson, Katherine M. [1 ]
Baker, Carl [1 ]
Sanders, Ashley D. [3 ]
Spierings, Diana C. J. [4 ]
Lansdorp, Peter M. [4 ,5 ,6 ]
Surti, Urvashi [7 ,8 ]
Hunkapiller, Michael W. [2 ]
Eichler, Evan E. [1 ,9 ]
机构
[1] Univ Washington, Dept Genome Sci, Sch Med, 3720 15th Ave NE S413C,Box 355065, Seattle, WA 98195 USA
[2] Pacific Biosci Calif, Menlo Pk, CA USA
[3] European Mol Biol Lab, Genome Biol Unit, Heidelberg, Germany
[4] Univ Groningen, Univ Med Ctr Groningen, European Res Inst Biol Ageing, Groningen, Netherlands
[5] BC Canc Agcy, Terry Fox Lab, Vancouver, BC, Canada
[6] Univ British Columbia, Dept Med Genet, Vancouver, BC, Canada
[7] Univ Pittsburgh, Sch Med, Dept Pathol, Pittsburgh, PA USA
[8] Univ Pittsburgh, Med Ctr, Pittsburgh, PA USA
[9] Univ Washington, Howard Hughes Med Inst, Seattle, WA 98195 USA
基金
美国国家卫生研究院; 欧洲研究理事会;
关键词
genome assembly; long-read sequencing; segmental duplications; structural variation; tandem repeats; REGIONS;
D O I
10.1111/ahg.12364
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
The sequence and assembly of human genomes using long-read sequencing technologies has revolutionized our understanding of structural variation and genome organization. We compared the accuracy, continuity, and gene annotation of genome assemblies generated from either high-fidelity (HiFi) or continuous long-read (CLR) datasets from the same complete hydatidiform mole human genome. We find that the HiFi sequence data assemble an additional 10% of duplicated regions and more accurately represent the structure of tandem repeats, as validated with orthogonal analyses. As a result, an additional 5 Mbp of pericentromeric sequences are recovered in the HiFi assembly, resulting in a 2.5-fold increase in the NG50 within 1 Mbp of the centromere (HiFi 480.6 kbp, CLR 191.5 kbp). Additionally, the HiFi genome assembly was generated in significantly less time with fewer computational resources than the CLR assembly. Although the HiFi assembly has significantly improved continuity and accuracy in many complex regions of the genome, it still falls short of the assembly of centromeric DNA and the largest regions of segmental duplication using existing assemblers. Despite these shortcomings, our results suggest that HiFi may be the most effective standalone technology for de novo assembly of human genomes.
引用
收藏
页码:125 / 140
页数:16
相关论文
共 33 条
  • [1] Characterizing the Major Structural Variant Alleles of the Human Genome
    Audano, Peter A.
    Sulovari, Arvis
    Graves-Lindsay, Tina A.
    Cantsilieris, Stuart
    Sorensen, Melanie
    Welch, AnneMarie E.
    Dougherty, Max L.
    Nelson, Bradley J.
    Shah, Ankeeta
    Dutcher, Susan K.
    Warren, Wesley C.
    Magrini, Vincent
    McGrath, Sean D.
    Li, Yang I.
    Wilson, Richard K.
    Eichler, Evan E.
    [J]. CELL, 2019, 176 (03) : 663 - +
  • [2] Tandem repeats finder: a program to analyze DNA sequences
    Benson, G
    [J]. NUCLEIC ACIDS RESEARCH, 1999, 27 (02) : 573 - 580
  • [3] Single-molecule sequencing and chromatin conformation capture enable de novo reference assembly of the domestic goat genome
    Bickhart, Derek M.
    Rosen, Benjamin D.
    Koren, Sergey
    Sayre, Brian L.
    Hastie, Alex R.
    Chan, Saki
    Lee, Joyce
    Lam, Ernest T.
    Liachko, Ivan
    Sullivan, Shawn T.
    Burton, Joshua N.
    Huson, Heather J.
    Nystrom, John C.
    Kelley, Christy M.
    Hutchison, Jana L.
    Zhou, Yang
    Sun, Jiajie
    Crisa, Alessandra
    de Leon, F. Abel Ponce
    Schwartz, John C.
    Hammond, John A.
    Waldbieser, Geoffrey C.
    Schroeder, Steven G.
    Liu, George E.
    Dunham, Maitreya J.
    Shendure, Jay
    Sonstegard, Tad S.
    Phillippy, Adam M.
    Van Tassell, Curtis P.
    Smith, Timothy P. L.
    [J]. NATURE GENETICS, 2017, 49 (04) : 643 - +
  • [4] Multi-platform discovery of haplotype-resolved structural variation in human genomes
    Chaisson, Mark J. P.
    Sanders, Ashley D.
    Zhao, Xuefang
    Malhotra, Ankit
    Porubsky, David
    Rausch, Tobias
    Gardner, Eugene J.
    Rodriguez, Oscar L.
    Guo, Li
    Collins, Ryan L.
    Fan, Xian
    Wen, Jia
    Handsaker, Robert E.
    Fairley, Susan
    Kronenberg, Zev N.
    Kong, Xiangmeng
    Hormozdiari, Fereydoun
    Lee, Dillon
    Wenger, Aaron M.
    Hastie, Alex R.
    Antaki, Danny
    Anantharaman, Thomas
    Audano, Peter A.
    Brand, Harrison
    Cantsilieris, Stuart
    Cao, Han
    Cerveira, Eliza
    Chen, Chong
    Chen, Xintong
    Chin, Chen-Shan
    Chong, Zechen
    Chuang, Nelson T.
    Lambert, Christine C.
    Church, Deanna M.
    Clarke, Laura
    Farrell, Andrew
    Flores, Joey
    Galeev, Timur
    Gorkin, David U.
    Gujral, Madhusudan
    Guryev, Victor
    Heaton, William Haynes
    Korlach, Jonas
    Kumar, Sushant
    Kwon, Jee Young
    Lam, Ernest T.
    Lee, Jong Eun
    Lee, Joyce
    Lee, Wan-Ping
    Lee, Sau Peng
    [J]. NATURE COMMUNICATIONS, 2019, 10 (1)
  • [5] Resolving the complexity of the human genome using single-molecule sequencing
    Chaisson, Mark J. P.
    Huddleston, John
    Dennis, Megan Y.
    Sudmant, Peter H.
    Malig, Maika
    Hormozdiari, Fereydoun
    Antonacci, Francesca
    Surti, Urvashi
    Sandstrom, Richard
    Boitano, Matthew
    Landolin, Jane M.
    Stamatoyannopoulos, John A.
    Hunkapiller, Michael W.
    Korlach, Jonas
    Eichler, Evan E.
    [J]. NATURE, 2015, 517 (7536) : 608 - U163
  • [6] Chin CS, 2016, NAT METHODS, V13, P1050, DOI [10.1038/nmeth.4035, 10.1038/NMETH.4035]
  • [7] Chin CS, 2013, NAT METHODS, V10, P563, DOI [10.1038/NMETH.2474, 10.1038/nmeth.2474]
  • [8] Falconer E, 2012, NAT METHODS, V9, P1107, DOI [10.1038/NMETH.2206, 10.1038/nmeth.2206]
  • [9] Long-read sequence assembly of the gorilla genome
    Gordon, David
    Huddleston, John
    Chaisson, Mark J. P.
    Hill, Christopher M.
    Kronenberg, Zev N.
    Munson, Katherine M.
    Malig, Maika
    Raja, Archana
    Fiddes, Ian
    Hillier, LaDeana W.
    Dunn, Christopher
    Baker, Carl
    Armstrong, Joel
    Diekhans, Mark
    Paten, Benedict
    Shendure, Jay
    Wilson, Richard K.
    Haussler, David
    Chin, Chen-Shan
    Eichler, Evan E.
    [J]. SCIENCE, 2016, 352 (6281)
  • [10] Discovery and genotyping of structural variation from long-read haploid genome sequence data
    Huddleston, John
    Chaisson, Mark J. P.
    Steinberg, Karyn Meltz
    Warren, Wes
    Hoekzema, Kendra
    Gordon, David
    Graves-Lindsay, Tina A.
    Munson, Katherine M.
    Kronenberg, Zev N.
    Vives, Laura
    Peluso, Paul
    Boitano, Matthew
    Chin, Chen-Shin
    Korlach, Jonas
    Wilson, Richard K.
    Eichler, Evan E.
    [J]. GENOME RESEARCH, 2017, 27 (05) : 677 - 685