Discovery and genotyping of structural variation from long-read haploid genome sequence data

被引:236
|
作者
Huddleston, John [1 ,2 ]
Chaisson, Mark J. P. [1 ]
Steinberg, Karyn Meltz [3 ]
Warren, Wes [3 ]
Hoekzema, Kendra [1 ]
Gordon, David [1 ,2 ]
Graves-Lindsay, Tina A. [3 ]
Munson, Katherine M. [1 ]
Kronenberg, Zev N. [1 ]
Vives, Laura [1 ]
Peluso, Paul [4 ]
Boitano, Matthew [4 ]
Chin, Chen-Shin [4 ]
Korlach, Jonas [4 ]
Wilson, Richard K. [5 ]
Eichler, Evan E. [1 ,2 ]
机构
[1] Univ Washington, Sch Med, Dept Genome Sci, Seattle, WA 98195 USA
[2] Univ Washington, Howard Hughes Med Inst, Seattle, WA 98195 USA
[3] Washington Univ, Sch Med, McDonnell Genome Inst, Dept Med,Dept Genet, St Louis, MO 63108 USA
[4] Pacific Biosci Calif Inc, Menlo Pk, CA 94025 USA
[5] Univ Pittsburgh, Dept Pathol, Pittsburgh, PA 15261 USA
基金
美国国家卫生研究院;
关键词
COPY NUMBER VARIATION; FRAMEWORK; RESOURCE; ORIGIN;
D O I
10.1101/gr.214007.116
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
In an effort to more fully understand the full spectrum of human genetic variation, we generated deep single-molecule, real-time (SMRT) sequencing data from two haploid human genomes. By using an assembly-based approach (SMRT-SV), we systematically assessed each genome independently for structural variants (SVs) and indels resolving the sequence structure of 461,553 genetic variants from 2 bp to 28 kbp in length. We find that >89% of these variants have been missed as part of analysis of the 1000 Genomes Project even after adjusting for more common variants (MAF >1%). We estimate that this theoretical human diploid differs by as much as similar to 16 Mbp with respect to the human reference, with long-read sequencing data providing a fivefold increase in sensitivity for genetic variants ranging in size from 7 bp to 1 kbp compared with short-read sequence data. Although a large fraction of genetic variants were not detected by short-read approaches, once the alternate allele is sequence-resolved, we show that 61% of SVs can be genotyped in short-read sequence data sets with high accuracy. Uncoupling discovery fromgenotyping thus allows for the majority of this missed common variation to be genotyped in the human population. Interestingly, when we repeat SV detection on a pseudodiploid genome constructed in silico by merging the two haploids, we find that similar to 59% of the heterozygous SVs are no longer detected by SMRT-SV. These results indicate that haploid resolution of long-read sequencing data will significantly increase sensitivity of SV detection.
引用
收藏
页码:677 / 685
页数:9
相关论文
共 50 条
  • [31] Structural variation and its potential impact on genome instability: Novel discoveries in the EGFR landscape by long-read sequencing
    Cook, George W.
    Benton, Michael G.
    Akerley, Wallace
    Mayhew, George F.
    Moehlenkamp, Cynthia
    Raterman, Denise
    Burgess, Daniel L.
    Rowell, William J.
    Lambert, Christine
    Eng, Kevin
    Gu, Jenny
    Baybayan, Primo
    Fussell, John T.
    Herboldl, Heath D.
    O'Shea, John M.
    Varghese, Thomas K.
    Emerson, Lyska L.
    PLOS ONE, 2020, 15 (01):
  • [32] Straglr: discovering and genotyping tandem repeat expansions using whole genome long-read sequences
    Chiu, Readman
    Rajan-Babu, Indhu-Shree
    Friedman, Jan M.
    Birol, Inanc
    GENOME BIOLOGY, 2021, 22 (01)
  • [33] Straglr: discovering and genotyping tandem repeat expansions using whole genome long-read sequences
    Readman Chiu
    Indhu-Shree Rajan-Babu
    Jan M. Friedman
    Inanc Birol
    Genome Biology, 22
  • [34] Benchmarking long-read aligners and SV callers for structural variation detection in Oxford nanopore sequencing data
    Helal, Asmaa A.
    Saad, Bishoy T.
    Saad, Mina T.
    Mosaad, Gamal S.
    Aboshanab, Khaled M.
    SCIENTIFIC REPORTS, 2024, 14 (01)
  • [35] Benchmarking long-read aligners and SV callers for structural variation detection in Oxford nanopore sequencing data
    Asmaa A. Helal
    Bishoy T. Saad
    Mina T. Saad
    Gamal S. Mosaad
    Khaled M. Aboshanab
    Scientific Reports, 14
  • [36] A survey of algorithms for the detection of genomic structural variants from long-read sequencing data
    Mian Umair Ahsan
    Qian Liu
    Jonathan Elliot Perdomo
    Li Fang
    Kai Wang
    Nature Methods, 2023, 20 : 1143 - 1158
  • [37] Decoil: Reconstructing Extrachromosomal DNA Structural Heterogeneity from Long-Read Sequencing Data
    Giurgiu, Madalina
    Wittstruck, Nadine
    Rodriguez-Fos, Elias
    Gonzalez, Rocio Chamorro
    Brueckner, Lotte
    Krienelke-Szymansky, Annabell
    Helmsauer, Konstantin
    Hartebrodt, Anne
    Euskirchen, Philipp
    Koche, Richard P.
    Haase, Kerstin
    Reinert, Knut
    Henssen, Anton G.
    RESEARCH IN COMPUTATIONAL MOLECULAR BIOLOGY, RECOMB 2024, 2024, 14758 : 406 - 411
  • [38] A survey of algorithms for the detection of genomic structural variants from long-read sequencing data
    Ahsan, Mian Umair
    Liu, Qian
    Perdomo, Jonathan Elliot
    Fang, Li
    Wang, Kai
    NATURE METHODS, 2023, 20 (08) : 1143 - 1158
  • [39] Genome sequencing using long-read sequencing
    McEwen, Juan Guillermo
    Gomez, Oscar Mauricio
    REVISTA DE LA ACADEMIA COLOMBIANA DE CIENCIAS EXACTAS FISICAS Y NATURALES, 2023, 47 (183): : 439 - 444
  • [40] Long-read sequence and assembly of segmental duplications
    Vollger, Mitchell R.
    Dishuck, Philip C.
    Sorensen, Melanie
    Welch, AnneMarie E.
    Dang, Vy
    Dougherty, Max L.
    Graves-Lindsay, Tina A.
    Wilson, Richard K.
    Chaisson, Mark J. P.
    Eichler, Evan E.
    NATURE METHODS, 2019, 16 (01) : 88 - +