Discovery and genotyping of structural variation from long-read haploid genome sequence data

被引:235
作者
Huddleston, John [1 ,2 ]
Chaisson, Mark J. P. [1 ]
Steinberg, Karyn Meltz [3 ]
Warren, Wes [3 ]
Hoekzema, Kendra [1 ]
Gordon, David [1 ,2 ]
Graves-Lindsay, Tina A. [3 ]
Munson, Katherine M. [1 ]
Kronenberg, Zev N. [1 ]
Vives, Laura [1 ]
Peluso, Paul [4 ]
Boitano, Matthew [4 ]
Chin, Chen-Shin [4 ]
Korlach, Jonas [4 ]
Wilson, Richard K. [5 ]
Eichler, Evan E. [1 ,2 ]
机构
[1] Univ Washington, Sch Med, Dept Genome Sci, Seattle, WA 98195 USA
[2] Univ Washington, Howard Hughes Med Inst, Seattle, WA 98195 USA
[3] Washington Univ, Sch Med, McDonnell Genome Inst, Dept Med,Dept Genet, St Louis, MO 63108 USA
[4] Pacific Biosci Calif Inc, Menlo Pk, CA 94025 USA
[5] Univ Pittsburgh, Dept Pathol, Pittsburgh, PA 15261 USA
基金
美国国家卫生研究院;
关键词
COPY NUMBER VARIATION; FRAMEWORK; RESOURCE; ORIGIN;
D O I
10.1101/gr.214007.116
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
In an effort to more fully understand the full spectrum of human genetic variation, we generated deep single-molecule, real-time (SMRT) sequencing data from two haploid human genomes. By using an assembly-based approach (SMRT-SV), we systematically assessed each genome independently for structural variants (SVs) and indels resolving the sequence structure of 461,553 genetic variants from 2 bp to 28 kbp in length. We find that >89% of these variants have been missed as part of analysis of the 1000 Genomes Project even after adjusting for more common variants (MAF >1%). We estimate that this theoretical human diploid differs by as much as similar to 16 Mbp with respect to the human reference, with long-read sequencing data providing a fivefold increase in sensitivity for genetic variants ranging in size from 7 bp to 1 kbp compared with short-read sequence data. Although a large fraction of genetic variants were not detected by short-read approaches, once the alternate allele is sequence-resolved, we show that 61% of SVs can be genotyped in short-read sequence data sets with high accuracy. Uncoupling discovery fromgenotyping thus allows for the majority of this missed common variation to be genotyped in the human population. Interestingly, when we repeat SV detection on a pseudodiploid genome constructed in silico by merging the two haploids, we find that similar to 59% of the heterozygous SVs are no longer detected by SMRT-SV. These results indicate that haploid resolution of long-read sequencing data will significantly increase sensitivity of SV detection.
引用
收藏
页码:677 / 685
页数:9
相关论文
共 46 条
  • [1] Personalized copy number and segmental duplication maps using next-generation sequencing
    Alkan, Can
    Kidd, Jeffrey M.
    Marques-Bonet, Tomas
    Aksay, Gozde
    Antonacci, Francesca
    Hormozdiari, Fereydoun
    Kitzman, Jacob O.
    Baker, Carl
    Malig, Maika
    Mutlu, Onur
    Sahinalp, S. Cenk
    Gibbs, Richard A.
    Eichler, Evan E.
    [J]. NATURE GENETICS, 2009, 41 (10) : 1061 - U29
  • [2] A global reference for human genetic variation
    Altshuler, David M.
    Durbin, Richard M.
    Abecasis, Goncalo R.
    Bentley, David R.
    Chakravarti, Aravinda
    Clark, Andrew G.
    Donnelly, Peter
    Eichler, Evan E.
    Flicek, Paul
    Gabriel, Stacey B.
    Gibbs, Richard A.
    Green, Eric D.
    Hurles, Matthew E.
    Knoppers, Bartha M.
    Korbel, Jan O.
    Lander, Eric S.
    Lee, Charles
    Lehrach, Hans
    Mardis, Elaine R.
    Marth, Gabor T.
    McVean, Gil A.
    Nickerson, Deborah A.
    Wang, Jun
    Wilson, Richard K.
    Boerwinkle, Eric
    Doddapaneni, Harsha
    Han, Yi
    Korchina, Viktoriya
    Kovar, Christie
    Lee, Sandra
    Muzny, Donna
    Reid, Jeffrey G.
    Zhu, Yiming
    Chang, Yuqi
    Feng, Qiang
    Fang, Xiaodong
    Guo, Xiaosen
    Jian, Min
    Jiang, Hui
    Jin, Xin
    Lan, Tianming
    Li, Guoqing
    Li, Jingxiang
    Li, Yingrui
    Liu, Shengmao
    Liu, Xiao
    Lu, Yao
    Ma, Xuedi
    Tang, Meifang
    Wang, Bo
    [J]. NATURE, 2015, 526 (7571) : 68 - +
  • [3] Assembling large genomes with single-molecule sequencing and locality-sensitive hashing
    Berlin, Konstantin
    Koren, Sergey
    Chin, Chen-Shan
    Drake, James P.
    Landolin, Jane M.
    Phillippy, Adam M.
    [J]. NATURE BIOTECHNOLOGY, 2015, 33 (06) : 623 - +
  • [4] Comprehensive identification and characterization of diallelic insertion-deletion polymorphisms in 330 human candidate genes
    Bhangale, TR
    Rieder, MJ
    Livingston, RJ
    Nickerson, DA
    [J]. HUMAN MOLECULAR GENETICS, 2005, 14 (01) : 59 - 69
  • [5] Genotype Imputation with Millions of Reference Samples
    Browning, Brian L.
    Browning, Sharon R.
    [J]. AMERICAN JOURNAL OF HUMAN GENETICS, 2016, 98 (01) : 116 - 126
  • [6] MIPSTR: a method for multiplex genotyping of germline and somatic STR variation across many individuals
    Carlson, Keisha D.
    Sudmant, Peter H.
    Press, Maximilian O.
    Eichler, Evan E.
    Shendure, Jay
    Queitsch, Christine
    [J]. GENOME RESEARCH, 2015, 25 (05) : 750 - 761
  • [7] Mapping single molecule sequencing reads using basic local alignment with successive refinement (BLASR): application and theory
    Chaisson, Mark J.
    Tesler, Glenn
    [J]. BMC BIOINFORMATICS, 2012, 13
  • [8] APPLICATIONS OF NEXT-GENERATION SEQUENCING Genetic variation and the de novo assembly of human genomes
    Chaisson, Mark J. P.
    Wilson, Richard K.
    Eichler, Evan E.
    [J]. NATURE REVIEWS GENETICS, 2015, 16 (11) : 627 - 640
  • [9] Resolving the complexity of the human genome using single-molecule sequencing
    Chaisson, Mark J. P.
    Huddleston, John
    Dennis, Megan Y.
    Sudmant, Peter H.
    Malig, Maika
    Hormozdiari, Fereydoun
    Antonacci, Francesca
    Surti, Urvashi
    Sandstrom, Richard
    Boitano, Matthew
    Landolin, Jane M.
    Stamatoyannopoulos, John A.
    Hunkapiller, Michael W.
    Korlach, Jonas
    Eichler, Evan E.
    [J]. NATURE, 2015, 517 (7536) : 608 - U163
  • [10] Chin CS, 2016, NAT METHODS, V13, P1050, DOI [10.1038/NMETH.4035, 10.1038/nmeth.4035]