WHATSHAP: Weighted Haplotype Assembly for Future-Generation Sequencing Reads

被引:254
作者
Patterson, Murray [1 ,8 ]
Marschall, Tobias [2 ,3 ,8 ]
Pisanti, Nadia [4 ,7 ,8 ]
Van Iersel, Leo [5 ,8 ]
Stougie, Leen [5 ,6 ,7 ,8 ]
Klau, Gunnar W. [5 ,6 ,7 ,8 ]
Schonhuth, Alexander [5 ,8 ]
机构
[1] Univ Lyon 1, Lab Biometrie & Biol Evolut LBBE UMR CNRS 5558, F-69622 Villeurbanne, France
[2] Univ Saarland, Ctr Bioinformat, D-66123 Saarbrucken, Germany
[3] Max Planck Inst Informat, D-66123 Saarbrucken, Germany
[4] Univ Pisa, Dept Comp Sci, I-56100 Pisa, Italy
[5] Ctr Wiskunde & Informat, Life Sci, Amsterdam, Netherlands
[6] Vrije Univ Amsterdam, Amsterdam, Netherlands
[7] INRIA, Erable Team, Villeurbanne, France
[8] Ctr Wiskunde & Informat, Life Sci Grp, Amsterdam, Netherlands
关键词
algorithms; combinatorial optimization; dynamic programming; haplotypes; next generation sequencing; ALGORITHM; RECONSTRUCTION;
D O I
10.1089/cmb.2014.0157
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
The human genome is diploid, which requires assigning heterozygous single nucleotide polymorphisms (SNPs) to the two copies of the genome. The resulting haplotypes, lists of SNPs belonging to each copy, are crucial for downstream analyses in population genetics. Currently, statistical approaches, which are oblivious to direct read information, constitute the state-of-the-art. Haplotype assembly, which addresses phasing directly from sequencing reads, suffers from the fact that sequencing reads of the current generation are too short to serve the purposes of genome-wide phasing. While future-technology sequencing reads will contain sufficient amounts of SNPs per read for phasing, they are also likely to suffer from higher sequencing error rates. Currently, no haplotype assembly approaches exist that allow for taking both increasing read length and sequencing error information into account. Here, we suggest WhatsHap, the first approach that yields provably optimal solutions to the weighted minimum error correction problem in runtime linear in the number of SNPs. WhatsHap is a fixed parameter tractable (FPT) approach with coverage as the parameter. We demonstrate that WhatsHap can handle datasets of coverage up to 20x, and that 15x are generally enough for reliably phasing long reads, even at significantly elevated sequencing error rates. We also find that the switch and flip error rates of the haplotypes we output are favorable when comparing them with state-of-the-art statistical phasers.
引用
收藏
页码:498 / 509
页数:12
相关论文
共 36 条
[1]   Haplotype assembly in polyploid genomes and identical by descent shared tracts [J].
Aguiar, Derek ;
Istrail, Sorin .
BIOINFORMATICS, 2013, 29 (13) :352-360
[2]   HapCompass: A Fast Cycle Basis Algorithm for Accurate Haplotype Assembly of Sequence Data [J].
Aguiar, Derek ;
Istrail, Sorin .
JOURNAL OF COMPUTATIONAL BIOLOGY, 2012, 19 (06) :577-590
[3]   A map of human genome variation from population-scale sequencing [J].
Altshuler, David ;
Durbin, Richard M. ;
Abecasis, Goncalo R. ;
Bentley, David R. ;
Chakravarti, Aravinda ;
Clark, Andrew G. ;
Collins, Francis S. ;
De la Vega, Francisco M. ;
Donnelly, Peter ;
Egholm, Michael ;
Flicek, Paul ;
Gabriel, Stacey B. ;
Gibbs, Richard A. ;
Knoppers, Bartha M. ;
Lander, Eric S. ;
Lehrach, Hans ;
Mardis, Elaine R. ;
McVean, Gil A. ;
Nickerson, DebbieA. ;
Peltonen, Leena ;
Schafer, Alan J. ;
Sherry, Stephen T. ;
Wang, Jun ;
Wilson, Richard K. ;
Gibbs, Richard A. ;
Deiros, David ;
Metzker, Mike ;
Muzny, Donna ;
Reid, Jeff ;
Wheeler, David ;
Wang, Jun ;
Li, Jingxiang ;
Jian, Min ;
Li, Guoqing ;
Li, Ruiqiang ;
Liang, Huiqing ;
Tian, Geng ;
Wang, Bo ;
Wang, Jian ;
Wang, Wei ;
Yang, Huanming ;
Zhang, Xiuqing ;
Zheng, Huisong ;
Lander, Eric S. ;
Altshuler, David L. ;
Ambrogio, Lauren ;
Bloom, Toby ;
Cibulskis, Kristian ;
Fennell, Tim J. ;
Gabriel, Stacey B. .
NATURE, 2010, 467 (7319) :1061-1073
[4]   Integrating common and rare genetic variation in diverse human populations [J].
Altshuler, David M. ;
Gibbs, Richard A. ;
Peltonen, Leena ;
Dermitzakis, Emmanouil ;
Schaffner, Stephen F. ;
Yu, Fuli ;
Bonnen, Penelope E. ;
de Bakker, Paul I. W. ;
Deloukas, Panos ;
Gabriel, Stacey B. ;
Gwilliam, Rhian ;
Hunt, Sarah ;
Inouye, Michael ;
Jia, Xiaoming ;
Palotie, Aarno ;
Parkin, Melissa ;
Whittaker, Pamela ;
Chang, Kyle ;
Hawes, Alicia ;
Lewis, Lora R. ;
Ren, Yanru ;
Wheeler, David ;
Muzny, Donna Marie ;
Barnes, Chris ;
Darvishi, Katayoon ;
Hurles, Matthew ;
Korn, Joshua M. ;
Kristiansson, Kati ;
Lee, Charles ;
McCarroll, Steven A. ;
Nemesh, James ;
Keinan, Alon ;
Montgomery, Stephen B. ;
Pollack, Samuela ;
Price, Alkes L. ;
Soranzo, Nicole ;
Gonzaga-Jauregui, Claudia ;
Anttila, Verneri ;
Brodeur, Wendy ;
Daly, Mark J. ;
Leslie, Stephen ;
McVean, Gil ;
Moutsianas, Loukas ;
Nguyen, Huy ;
Zhang, Qingrun ;
Ghori, Mohammed J. R. ;
McGinnis, Ralph ;
McLaren, William ;
Takeuchi, Fumihiko ;
Grossman, Sharon R. .
NATURE, 2010, 467 (7311) :52-58
[5]  
[Anonymous], 2013, ARXIV, DOI DOI 10.48550/ARXIV.1303.3997
[6]   HapCUT: an efficient and accurate algorithm for the haplotype assembly problem [J].
Bansal, Vikas ;
Bafna, Vineet .
BIOINFORMATICS, 2008, 24 (16) :I153-I159
[7]   An MCMC algorithm for haplotype assembly from whole-genome sequence data [J].
Bansal, Vikas ;
Halpern, Aaron L. ;
Axelrod, Nelson ;
Bafna, Vineet .
GENOME RESEARCH, 2008, 18 (08) :1336-1346
[8]   The Genome of the Netherlands: design, and project goals [J].
Boomsma, Dorret I. ;
Wijmenga, Cisca ;
Slagboom, Eline P. ;
Swertz, Morris A. ;
Karssen, Lennart C. ;
Abdellaoui, Abdel ;
Ye, Kai ;
Guryev, Victor ;
Vermaat, Martijn ;
van Dijk, Freerk ;
Francioli, Laurent C. ;
Hottenga, Jouke Jan ;
Laros, Jeroen F. J. ;
Li, Qibin ;
Li, Yingrui ;
Cao, Hongzhi ;
Chen, Ruoyan ;
Du, Yuanping ;
Li, Ning ;
Cao, Sujie ;
van Setten, Jessica ;
Menelaou, Androniki ;
Pulit, Sara L. ;
Hehir-Kwa, Jayne Y. ;
Beekman, Marian ;
Elbers, Clara C. ;
Byelas, Heorhiy ;
de Craen, Anton J. M. ;
Deelen, Patrick ;
Dijkstra, Martijn ;
den Dunnen, Johan T. ;
de Knijff, Peter ;
Houwing-Duistermaat, Jeanine ;
Koval, Vyacheslav ;
Estrada, Karol ;
Hofman, Albert ;
Kanterakis, Alexandros ;
van Enckevort, David ;
Mai, Hailiang ;
Kattenberg, Mathijs ;
van Leeuwen, Elisabeth M. ;
Neerincx, Pieter B. T. ;
Oostra, Ben ;
Rivadeneira, Fernanodo ;
Suchiman, Eka H. D. ;
Uitterlinden, Andre G. ;
Willemsen, Gonneke ;
Wolffenbuttel, Bruce H. ;
Wang, Jun ;
de Bakker, Paul I. W. .
EUROPEAN JOURNAL OF HUMAN GENETICS, 2014, 22 (02) :221-227
[9]   Exact algorithms for haplotype assembly from whole-genome sequence data [J].
Chen, Zhi-Zhong ;
Deng, Fei ;
Wang, Lusheng .
BIOINFORMATICS, 2013, 29 (16) :1938-1945
[10]  
Cilibrasi R, 2005, LECT NOTES COMPUT SC, V3692, P128