Discovery and genotyping of structural variation from long-read haploid genome sequence data

被引:236
|
作者
Huddleston, John [1 ,2 ]
Chaisson, Mark J. P. [1 ]
Steinberg, Karyn Meltz [3 ]
Warren, Wes [3 ]
Hoekzema, Kendra [1 ]
Gordon, David [1 ,2 ]
Graves-Lindsay, Tina A. [3 ]
Munson, Katherine M. [1 ]
Kronenberg, Zev N. [1 ]
Vives, Laura [1 ]
Peluso, Paul [4 ]
Boitano, Matthew [4 ]
Chin, Chen-Shin [4 ]
Korlach, Jonas [4 ]
Wilson, Richard K. [5 ]
Eichler, Evan E. [1 ,2 ]
机构
[1] Univ Washington, Sch Med, Dept Genome Sci, Seattle, WA 98195 USA
[2] Univ Washington, Howard Hughes Med Inst, Seattle, WA 98195 USA
[3] Washington Univ, Sch Med, McDonnell Genome Inst, Dept Med,Dept Genet, St Louis, MO 63108 USA
[4] Pacific Biosci Calif Inc, Menlo Pk, CA 94025 USA
[5] Univ Pittsburgh, Dept Pathol, Pittsburgh, PA 15261 USA
基金
美国国家卫生研究院;
关键词
COPY NUMBER VARIATION; FRAMEWORK; RESOURCE; ORIGIN;
D O I
10.1101/gr.214007.116
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
In an effort to more fully understand the full spectrum of human genetic variation, we generated deep single-molecule, real-time (SMRT) sequencing data from two haploid human genomes. By using an assembly-based approach (SMRT-SV), we systematically assessed each genome independently for structural variants (SVs) and indels resolving the sequence structure of 461,553 genetic variants from 2 bp to 28 kbp in length. We find that >89% of these variants have been missed as part of analysis of the 1000 Genomes Project even after adjusting for more common variants (MAF >1%). We estimate that this theoretical human diploid differs by as much as similar to 16 Mbp with respect to the human reference, with long-read sequencing data providing a fivefold increase in sensitivity for genetic variants ranging in size from 7 bp to 1 kbp compared with short-read sequence data. Although a large fraction of genetic variants were not detected by short-read approaches, once the alternate allele is sequence-resolved, we show that 61% of SVs can be genotyped in short-read sequence data sets with high accuracy. Uncoupling discovery fromgenotyping thus allows for the majority of this missed common variation to be genotyped in the human population. Interestingly, when we repeat SV detection on a pseudodiploid genome constructed in silico by merging the two haploids, we find that similar to 59% of the heterozygous SVs are no longer detected by SMRT-SV. These results indicate that haploid resolution of long-read sequencing data will significantly increase sensitivity of SV detection.
引用
收藏
页码:677 / 685
页数:9
相关论文
共 50 条
  • [21] Democratizing long-read genome assembly
    Kirsche, Melanie
    Schatz, Michael C.
    CELL SYSTEMS, 2021, 12 (10) : 945 - 947
  • [22] On detection of somatic structural variation in highly repetitive regions using long-read sequencing data
    Shiraishi, Yuichi
    CANCER SCIENCE, 2024, 115 : 31 - 31
  • [23] Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data
    Chin, Chen-Shan
    Alexander, David H.
    Marks, Patrick
    Klammer, Aaron A.
    Drake, James
    Heiner, Cheryl
    Clum, Alicia
    Copeland, Alex
    Huddleston, John
    Eichler, Evan E.
    Turner, Stephen W.
    Korlach, Jonas
    NATURE METHODS, 2013, 10 (06) : 563 - +
  • [24] Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data
    Chin C.-S.
    Alexander D.H.
    Marks P.
    Klammer A.A.
    Drake J.
    Heiner C.
    Clum A.
    Copeland A.
    Huddleston J.
    Eichler E.E.
    Turner S.W.
    Korlach J.
    Nature Methods, 2013, 10 (6) : 563 - 569
  • [25] The impact of FASTQ and alignment read order on structural variant calling from long-read sequencing data
    Lesack, Kyle J.
    Wasmuth, James D.
    PEERJ, 2024, 12 : 1 - 19
  • [26] Long-Read Genome Sequence of the Sugar Beet Rhizosphere Mycoparasite Pythium oligandrum
    Faure, Charlene
    Veyssiere, Marine
    Boelle, Betty
    San Clemente, Helene
    Bouchez, Olivier
    Lopez-Roques, Celine
    Chaubet, Adeline
    Martinez, Yves
    Bezouska, Karel
    Suchanek, Martin
    Gaulin, Elodie
    Rey, Thomas
    Dumas, Bernard
    G3-GENES GENOMES GENETICS, 2020, 10 (02): : 431 - 436
  • [27] Benchmarking long-read genome sequence alignment tools for human genomics applications
    LoTempio, Jonathan
    Delot, Emmanuele
    Vilain, Eric
    PEERJ, 2023, 11
  • [28] K-mer analysis of long-read alignment pileups for structural variant genotyping
    English, Adam C.
    Cunial, Fabio
    Metcalf, Ginger A.
    Gibbs, Richard A.
    Sedlazeck, Fritz J.
    NATURE COMMUNICATIONS, 2025, 16 (01)
  • [29] Long-Read Annotation: Automated Eukaryotic Genome Annotation Based on Long-Read cDNA Sequencing
    Cook, David E.
    Valle-Inclan, Jose Espejo
    Pajoro, Alice
    Rovenich, Hanna
    Thomma, Bart P. H. J.
    Faino, Luigi
    PLANT PHYSIOLOGY, 2019, 179 (01) : 38 - 54
  • [30] Comparison and benchmark of structural variants detected from long read and long-read assembly
    Lin, Jiadong
    Jia, Peng
    Wang, Songbo
    Kosters, Walter
    Ye, Kai
    BRIEFINGS IN BIOINFORMATICS, 2023, 24 (04)