Haplotype-aware diplotyping from noisy long reads

被引:35
作者
Ebler, Jana [1 ,2 ,3 ]
Haukness, Marina [4 ]
Pesout, Trevor [4 ]
Marschall, Tobias [1 ,2 ]
Paten, Benedict [4 ]
机构
[1] Saarland Univ, Ctr Bioinformat, Saarland Informat Campus E2-1, D-66123 Saarbrucken, Germany
[2] Max Planck Inst Informat, Saarland Informat Campus E1-4, Saarbrucken, Germany
[3] Saarland Univ, Grad Sch Comp Sci, Saarland Informat Campus E1-3, Saarbrucken, Germany
[4] Univ Calif Santa Cruz, UC Santa Cruz Genom Inst, Santa Cruz, CA 95064 USA
基金
美国国家卫生研究院;
关键词
Computational genomics; Long reads; Genotyping; Phasing; Haplotypes; Diplotypes; HUMAN GENOME; ACCURATE; METHYLATION; COMPLEXITY; EFFICIENT;
D O I
10.1186/s13059-019-1709-0
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Current genotyping approaches for single-nucleotide variations rely on short, accurate reads from second-generation sequencing devices. Presently, third-generation sequencing platforms are rapidly becoming more widespread, yet approaches for leveraging their long but error-prone reads for genotyping are lacking. Here, we introduce a novel statistical framework for the joint inference of haplotypes and genotypes from noisy long reads, which we term diplotyping. Our technique takes full advantage of linkage information provided by long reads. We validate hundreds of thousands of candidate variants that have not yet been included in the high-confidence reference set of the Genome-in-a-Bottle effort.
引用
收藏
页数:16
相关论文
共 50 条
[1]   Genomic Characterization of Large Heterochromatic Gaps in the Human Genome Assembly [J].
Altemose, Nicolas ;
Miga, Karen H. ;
Maggioni, Mauro ;
Willard, Huntington F. .
PLOS COMPUTATIONAL BIOLOGY, 2014, 10 (05)
[2]   A global reference for human genetic variation [J].
Altshuler, David M. ;
Durbin, Richard M. ;
Abecasis, Goncalo R. ;
Bentley, David R. ;
Chakravarti, Aravinda ;
Clark, Andrew G. ;
Donnelly, Peter ;
Eichler, Evan E. ;
Flicek, Paul ;
Gabriel, Stacey B. ;
Gibbs, Richard A. ;
Green, Eric D. ;
Hurles, Matthew E. ;
Knoppers, Bartha M. ;
Korbel, Jan O. ;
Lander, Eric S. ;
Lee, Charles ;
Lehrach, Hans ;
Mardis, Elaine R. ;
Marth, Gabor T. ;
McVean, Gil A. ;
Nickerson, Deborah A. ;
Wang, Jun ;
Wilson, Richard K. ;
Boerwinkle, Eric ;
Doddapaneni, Harsha ;
Han, Yi ;
Korchina, Viktoriya ;
Kovar, Christie ;
Lee, Sandra ;
Muzny, Donna ;
Reid, Jeffrey G. ;
Zhu, Yiming ;
Chang, Yuqi ;
Feng, Qiang ;
Fang, Xiaodong ;
Guo, Xiaosen ;
Jian, Min ;
Jiang, Hui ;
Jin, Xin ;
Lan, Tianming ;
Li, Guoqing ;
Li, Jingxiang ;
Li, Yingrui ;
Liu, Shengmao ;
Liu, Xiao ;
Lu, Yao ;
Ma, Xuedi ;
Tang, Meifang ;
Wang, Bo .
NATURE, 2015, 526 (7571) :68-+
[3]  
[Anonymous], BIORXIV
[4]   Substantial regional variation in substitution rates in the human genome: Importance of GC content, gene density, and telomere-specific effects [J].
Arndt, PF ;
Hwa, T ;
Petrov, DA .
JOURNAL OF MOLECULAR EVOLUTION, 2005, 60 (06) :748-U28
[5]   HapCUT: an efficient and accurate algorithm for the haplotype assembly problem [J].
Bansal, Vikas ;
Bafna, Vineet .
BIOINFORMATICS, 2008, 24 (16) :I153-I159
[6]   The haplotyping problem: An overview of computational models and solutions [J].
Bonizzoni, P ;
Della Vedova, G ;
Dondi, R ;
Li, J .
JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 2003, 18 (06) :675-688
[7]   Haplotype phasing: existing methods and new developments [J].
Browning, Sharon R. ;
Browning, Brian L. .
NATURE REVIEWS GENETICS, 2011, 12 (10) :703-714
[8]  
Chaisson M.J. P., 2017, Multi-platform discovery of haplotype-resolved structural variation in human genomes, DOI DOI 10.1101/193144
[9]  
Chin CS, 2016, NAT METHODS, V13, P1050, DOI [10.1038/nmeth.4035, 10.1038/NMETH.4035]
[10]   The complexity of the single individual SNP haplotyping problem [J].
Cilibrasi, Rudi ;
van Iersel, Leo ;
Kelk, Steven ;
Tromp, John .
ALGORITHMICA, 2007, 49 (01) :13-36