Discovery and genotyping of structural variation from long-read haploid genome sequence data

被引:236
|
作者
Huddleston, John [1 ,2 ]
Chaisson, Mark J. P. [1 ]
Steinberg, Karyn Meltz [3 ]
Warren, Wes [3 ]
Hoekzema, Kendra [1 ]
Gordon, David [1 ,2 ]
Graves-Lindsay, Tina A. [3 ]
Munson, Katherine M. [1 ]
Kronenberg, Zev N. [1 ]
Vives, Laura [1 ]
Peluso, Paul [4 ]
Boitano, Matthew [4 ]
Chin, Chen-Shin [4 ]
Korlach, Jonas [4 ]
Wilson, Richard K. [5 ]
Eichler, Evan E. [1 ,2 ]
机构
[1] Univ Washington, Sch Med, Dept Genome Sci, Seattle, WA 98195 USA
[2] Univ Washington, Howard Hughes Med Inst, Seattle, WA 98195 USA
[3] Washington Univ, Sch Med, McDonnell Genome Inst, Dept Med,Dept Genet, St Louis, MO 63108 USA
[4] Pacific Biosci Calif Inc, Menlo Pk, CA 94025 USA
[5] Univ Pittsburgh, Dept Pathol, Pittsburgh, PA 15261 USA
基金
美国国家卫生研究院;
关键词
COPY NUMBER VARIATION; FRAMEWORK; RESOURCE; ORIGIN;
D O I
10.1101/gr.214007.116
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
In an effort to more fully understand the full spectrum of human genetic variation, we generated deep single-molecule, real-time (SMRT) sequencing data from two haploid human genomes. By using an assembly-based approach (SMRT-SV), we systematically assessed each genome independently for structural variants (SVs) and indels resolving the sequence structure of 461,553 genetic variants from 2 bp to 28 kbp in length. We find that >89% of these variants have been missed as part of analysis of the 1000 Genomes Project even after adjusting for more common variants (MAF >1%). We estimate that this theoretical human diploid differs by as much as similar to 16 Mbp with respect to the human reference, with long-read sequencing data providing a fivefold increase in sensitivity for genetic variants ranging in size from 7 bp to 1 kbp compared with short-read sequence data. Although a large fraction of genetic variants were not detected by short-read approaches, once the alternate allele is sequence-resolved, we show that 61% of SVs can be genotyped in short-read sequence data sets with high accuracy. Uncoupling discovery fromgenotyping thus allows for the majority of this missed common variation to be genotyped in the human population. Interestingly, when we repeat SV detection on a pseudodiploid genome constructed in silico by merging the two haploids, we find that similar to 59% of the heterozygous SVs are no longer detected by SMRT-SV. These results indicate that haploid resolution of long-read sequencing data will significantly increase sensitivity of SV detection.
引用
收藏
页码:677 / 685
页数:9
相关论文
共 50 条
  • [1] Discovery and genotyping of structural variation from long-read haploid genome sequence data (vol 27, pg 677, 2017)
    Huddleston, John
    Chaisson, Mark J. P.
    Steinberg, Karyn Meltz
    Warren, Wes
    Hoekzema, Kendra
    Gordon, David
    Graves-Lindsay, Tina A.
    Munson, Katherine M.
    Kronenberg, Zev N.
    Vives, Laura
    Peluso, Paul
    Boitano, Matthew
    Chin, Chen-Shin
    Korlach, Jonas
    Wilson, Richard K.
    Eichler, Evan E.
    GENOME RESEARCH, 2018, 28 (01) : 144 - 144
  • [2] Long-read genotyping with SLANG (Simple Long-read loci Assembly of Nanopore data for Genotyping)
    Dorfner, Marco
    Ott, Tankred
    Ott, Philipp
    Oberprieler, Christoph
    APPLICATIONS IN PLANT SCIENCES, 2022, 10 (03):
  • [3] Population-scale genotyping of structural variation in the era of long-read sequencing
    Quan, Cheng
    Lu, Hao
    Lu, Yiming
    Zhou, Gangqiao
    COMPUTATIONAL AND STRUCTURAL BIOTECHNOLOGY JOURNAL, 2022, 20 : 2639 - 2647
  • [4] Long-read sequence assembly of the gorilla genome
    Gordon, David
    Huddleston, John
    Chaisson, Mark J. P.
    Hill, Christopher M.
    Kronenberg, Zev N.
    Munson, Katherine M.
    Malig, Maika
    Raja, Archana
    Fiddes, Ian
    Hillier, LaDeana W.
    Dunn, Christopher
    Baker, Carl
    Armstrong, Joel
    Diekhans, Mark
    Paten, Benedict
    Shendure, Jay
    Wilson, Richard K.
    Haussler, David
    Chin, Chen-Shan
    Eichler, Evan E.
    SCIENCE, 2016, 352 (6281)
  • [5] A chromosome-level genome of mango exclusively from long-read sequence data
    Wijesundara, Upendra Kumari
    Masouleh, Ardashir Kharabian
    Furtado, Agnelo
    Dillon, Natalie L.
    Henry, Robert J.
    PLANT GENOME, 2024, 17 (02):
  • [6] Genome structural variation discovery and genotyping
    Can Alkan
    Bradley P. Coe
    Evan E. Eichler
    Nature Reviews Genetics, 2011, 12 : 363 - 376
  • [7] Long-read genome sequencing identifies causal structural variation in a Mendelian disease
    Merker, Jason D.
    Wenger, Aaron M.
    Sneddon, Tam
    Grove, Megan
    Zappala, Zachary
    Fresard, Laure
    Waggott, Daryl
    Utiramerur, Sowmi
    Hou, Yanli
    Smith, Kevin S.
    Montgomery, Stephen B.
    Wheeler, Matthew
    Buchan, Jillian G.
    Lambert, Christine C.
    Eng, Kevin S.
    Hickey, Luke
    Korlach, Jonas
    Ford, James
    Ashley, Euan A.
    GENETICS IN MEDICINE, 2018, 20 (01) : 159 - 163
  • [8] Comprehensive evaluation of structural variant genotyping methods based on long-read sequencing data
    Duan, Xiaoke
    Pan, Mingpei
    Fan, Shaohua
    BMC GENOMICS, 2022, 23 (01)
  • [9] Comprehensive evaluation of structural variant genotyping methods based on long-read sequencing data
    Xiaoke Duan
    Mingpei Pan
    Shaohua Fan
    BMC Genomics, 23
  • [10] SVLR: Genome Structural Variant Detection Using Long-Read Sequencing Data
    Gu, Wenyan
    Zhou, Aizhong
    Wang, Lusheng
    Sun, Shiwei
    Cui, Xuefeng
    Zhu, Daming
    JOURNAL OF COMPUTATIONAL BIOLOGY, 2021, 28 (08) : 774 - 788