Integrating read-based and population-based phasing for dense and accurate haplotyping of individual genomes

被引:16
作者
Bansal, Vikas [1 ]
机构
[1] Univ Calif San Diego, Sch Med, Dept Pediat, La Jolla, CA 92093 USA
关键词
ALGORITHM;
D O I
10.1093/bioinformatics/btz329
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation Reconstruction of haplotypes for human genomes is an important problem in medical and population genetics. Hi-C sequencing generates read pairs with long-range haplotype information that can be computationally assembled to generate chromosome-spanning haplotypes. However, the haplotypes have limited completeness and low accuracy. Haplotype information from population reference panels can potentially be used to improve the completeness and accuracy of Hi-C haplotyping. Results In this paper, we describe a likelihood based method to integrate short-range haplotype information from a population reference panel of haplotypes with the long-range haplotype information present in sequence reads from methods such as Hi-C to assemble dense and highly accurate haplotypes for individual genomes. Our method leverages a statistical phasing method and a maximum spanning tree algorithm to determine the optimal second-order approximation of the population-based haplotype likelihood for an individual genome. The population-based likelihood is encoded using pseudo-reads which are then used as input along with sequence reads for haplotype assembly using an existing tool, HapCUT2. Using whole-genome Hi-C data for two human genomes (NA19240 and NA12878), we demonstrate that this integrated phasing method enables the phasing of 97-98% of variants, reduces the switch error rates by 3-6-fold, and outperforms an existing method for combining phase information from sequence reads with population-based phasing. On Strand-seq data for NA12878, our method improves the haplotype completeness from 71.4 to 94.6% and reduces the switch error rate 2-fold, demonstrating its utility for phasing using multiple sequencing technologies. Availability and implementation Code and datasets are available at https://github.com/vibansal/IntegratedPhasing.
引用
收藏
页码:I242 / I248
页数:7
相关论文
共 30 条
[1]   HapCompass: A Fast Cycle Basis Algorithm for Accurate Haplotype Assembly of Sequence Data [J].
Aguiar, Derek ;
Istrail, Sorin .
JOURNAL OF COMPUTATIONAL BIOLOGY, 2012, 19 (06) :577-590
[2]   A global reference for human genetic variation [J].
Altshuler, David M. ;
Durbin, Richard M. ;
Abecasis, Goncalo R. ;
Bentley, David R. ;
Chakravarti, Aravinda ;
Clark, Andrew G. ;
Donnelly, Peter ;
Eichler, Evan E. ;
Flicek, Paul ;
Gabriel, Stacey B. ;
Gibbs, Richard A. ;
Green, Eric D. ;
Hurles, Matthew E. ;
Knoppers, Bartha M. ;
Korbel, Jan O. ;
Lander, Eric S. ;
Lee, Charles ;
Lehrach, Hans ;
Mardis, Elaine R. ;
Marth, Gabor T. ;
McVean, Gil A. ;
Nickerson, Deborah A. ;
Wang, Jun ;
Wilson, Richard K. ;
Boerwinkle, Eric ;
Doddapaneni, Harsha ;
Han, Yi ;
Korchina, Viktoriya ;
Kovar, Christie ;
Lee, Sandra ;
Muzny, Donna ;
Reid, Jeffrey G. ;
Zhu, Yiming ;
Chang, Yuqi ;
Feng, Qiang ;
Fang, Xiaodong ;
Guo, Xiaosen ;
Jian, Min ;
Jiang, Hui ;
Jin, Xin ;
Lan, Tianming ;
Li, Guoqing ;
Li, Jingxiang ;
Li, Yingrui ;
Liu, Shengmao ;
Liu, Xiao ;
Lu, Yao ;
Ma, Xuedi ;
Tang, Meifang ;
Wang, Bo .
NATURE, 2015, 526 (7571) :68-+
[3]  
[Anonymous], 2010, P 1 ACM INT C BIOINF
[4]   HapCUT: an efficient and accurate algorithm for the haplotype assembly problem [J].
Bansal, Vikas ;
Bafna, Vineet .
BIOINFORMATICS, 2008, 24 (16) :I153-I159
[5]   Extending partial haplotypes to full genome haplotypes using chromosome conformation capture data [J].
Ben-Elazar, Shay ;
Chor, Benny ;
Yakhini, Zohar .
BIOINFORMATICS, 2016, 32 (17) :559-566
[6]   Haplotype phasing: existing methods and new developments [J].
Browning, Sharon R. ;
Browning, Brian L. .
NATURE REVIEWS GENETICS, 2011, 12 (10) :703-714
[7]   APPROXIMATING DISCRETE PROBABILITY DISTRIBUTIONS WITH DEPENDENCE TREES [J].
CHOW, CK ;
LIU, CN .
IEEE TRANSACTIONS ON INFORMATION THEORY, 1968, 14 (03) :462-+
[8]   Ultraaccurate genome sequencing and haplotyping of single human cells [J].
Chu, Wai Keung ;
Edge, Peter ;
Lee, Ho Suk ;
Bansal, Vikas ;
Zhang, Kun ;
Bafna, Vineet ;
Huang, Xiaohua .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2017, 114 (47) :12512-12517
[9]   The international Genome sample resource (IGSR): A worldwide collection of genome variation incorporating the 1000 Genomes Project data [J].
Clarke, Laura ;
Fairley, Susan ;
Zheng-Bradley, Xiangqun ;
Streeter, Ian ;
Perry, Emily ;
Lowy, Ernesto ;
Tasse, Anne-Marie ;
Flicek, Paul .
NUCLEIC ACIDS RESEARCH, 2017, 45 (D1) :D854-D859
[10]   Haplotype Estimation Using Sequencing Reads [J].
Delaneau, Olivier ;
Howie, Bryan ;
Cox, Anthony J. ;
Zagury, Jean-Francois ;
Marchini, Jonathan .
AMERICAN JOURNAL OF HUMAN GENETICS, 2013, 93 (04) :687-696