HapCUT2: robust and accurate haplotype assembly for diverse sequencing technologies

被引:213
作者
Edge, Peter [1 ]
Bafna, Vineet [1 ]
Bansal, Vikas [2 ]
机构
[1] Univ Calif San Diego, Dept Comp Sci & Engn, La Jolla, CA 92053 USA
[2] Univ Calif San Diego, Sch Med, Dept Pediat, La Jolla, CA 92053 USA
基金
美国国家科学基金会; 美国国家卫生研究院;
关键词
PROXIMITY-LIGATION; HUMAN GENOME; WHOLE; ALGORITHM; SHOTGUN; DESIGN;
D O I
10.1101/gr.213462.116
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Many tools have been developed for haplotype assembly-the reconstruction of individual haplotypes using reads mapped to a reference genome sequence. Due to increasing interest in obtaining haplotype-resolved human genomes, a range of new sequencing protocols and technologies have been developed to enable the reconstruction of whole-genome haplotypes. However, existing computational methods designed to handle specific technologies do not scale well on data from different protocols. We describe a new algorithm, HapCUT2, that extends our previous method (HapCUT) to handle multiple sequencing technologies. Using simulations and whole-genome sequencing (WGS) data from multiple different data types-dilution pool sequencing, linked-read sequencing, single molecule real-time (SMRT) sequencing, and proximity ligation (Hi-C) sequencing-we show that HapCUT2 rapidly assembles haplotypes with best-in-class accuracy for all data types. In particular, HapCUT2 scales well for high sequencing coverage and rapidly assembled haplotypes for two long-read WGS data sets on which other methods struggled. Further, HapCUT2 directly models Hi-C specific error modalities, resulting in significant improvements in error rates compared to HapCUT, the only other method that could assemble haplotypes from Hi-C data. Using HapCUT2, haplotype assembly from a 90x coverage whole-genome Hi-C data set yielded high-resolution haplotypes (78.6% of variants phased in a single block) with high pairwise phasing accuracy (similar to 98% across chromosomes). Our results demonstrate that HapCUT2 is a robust tool for haplotype assembly applicable to data from diverse sequencing technologies.
引用
收藏
页码:801 / 812
页数:12
相关论文
共 39 条
[1]   HapCompass: A Fast Cycle Basis Algorithm for Accurate Haplotype Assembly of Sequence Data [J].
Aguiar, Derek ;
Istrail, Sorin .
JOURNAL OF COMPUTATIONAL BIOLOGY, 2012, 19 (06) :577-590
[2]   Haplotype-resolved whole-genome sequencing by contiguity-preserving transposition and combinatorial indexing [J].
Amini, Sasan ;
Pushkarev, Dmitry ;
Christiansen, Lena ;
Kostem, Emrah ;
Royce, Tom ;
Turk, Casey ;
Pignatelli, Natasha ;
Adey, Andrew ;
Kitzman, Jacob O. ;
Vijayan, Kandaswamy ;
Ronaghi, Mostafa ;
Shendure, Jay ;
Gunderson, Kevin L. ;
Steemers, Frank J. .
NATURE GENETICS, 2014, 46 (12) :1343-1349
[3]  
[Anonymous], ALIGNING SEQUENCE RE, DOI DOI 10.48550/ARXIV.1303.3997
[4]   HapCUT: an efficient and accurate algorithm for the haplotype assembly problem [J].
Bansal, Vikas ;
Bafna, Vineet .
BIOINFORMATICS, 2008, 24 (16) :I153-I159
[5]   An MCMC algorithm for haplotype assembly from whole-genome sequence data [J].
Bansal, Vikas ;
Halpern, Aaron L. ;
Axelrod, Nelson ;
Bafna, Vineet .
GENOME RESEARCH, 2008, 18 (08) :1336-1346
[6]   Hi-C: A comprehensive technique to capture the conformation of genomes [J].
Belton, Jon-Matthew ;
McCord, Rachel Patton ;
Gibcus, Johan Harmen ;
Naumova, Natalia ;
Zhan, Ye ;
Dekker, Job .
METHODS, 2012, 58 (03) :268-276
[7]   HapTree: A Novel Bayesian Framework for Single Individual Polyplotyping Using NGS Data [J].
Berger, Emily ;
Yorukoglu, Deniz ;
Peng, Jian ;
Berger, Bonnie .
PLOS COMPUTATIONAL BIOLOGY, 2014, 10 (03)
[8]   Haplotype phasing: existing methods and new developments [J].
Browning, Sharon R. ;
Browning, Brian L. .
NATURE REVIEWS GENETICS, 2011, 12 (10) :703-714
[9]   Improved whole-chromosome phasing for disease and population genetic studies [J].
Delaneau, Olivier ;
Zagury, Jean-Francois ;
Marchini, Jonathan .
NATURE METHODS, 2013, 10 (01) :5-6
[10]  
Duitama J, 2010, P 1 ACM INT C BIOINF, P160, DOI DOI 10.1145/1854776.1854802