A graph-based algorithm for estimating clonal haplotypes of tumor sample from sequencing data

被引:2
作者
Wang, Yixuan [1 ,2 ]
Zhang, Xuanping [1 ,2 ]
Ding, Shuai [3 ]
Geng, Yu [1 ,2 ]
Liu, Jianye [1 ,2 ]
Zhao, Zhongmeng [1 ,2 ]
Zhang, Rong [1 ,2 ]
Xiao, Xiao [2 ,4 ]
Wang, Jiayin [1 ,2 ]
机构
[1] Xi An Jiao Tong Univ, Sch Elect & Informat Engn, Dept Comp Sci & Technol, Xian 710048, Shaanxi, Peoples R China
[2] Xi An Jiao Tong Univ, Sch Elect & Informat Engn, Shaanxi Engn Res Ctr Med & Hlth Big Data, Xian 710048, Shaanxi, Peoples R China
[3] Hefei Univ Technol, Sch Management, Minist Educ, Key Lab Proc Optimizat & Intelligent Decis Making, Hefei 23009, Anhui, Peoples R China
[4] Xi An Jiao Tong Univ, Inst Hlth Adm & Policy, Sch Publ Policy & Adm, Xian 710048, Shaanxi, Peoples R China
基金
中国博士后科学基金; 美国国家科学基金会;
关键词
Cancer genomics; Haplotype phasing; Clonal haplotype; Computational pipeline; Sequencing data analysis; CANCER; EVOLUTION; MUTATIONS; INFERENCE; HISTORY;
D O I
10.1186/s12920-018-0457-4
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
BackgroundHaplotype phasing is an important step in many bioinformatics workflows. In cancer genomics, it is suggested that reconstructing the clonal haplotypes of a tumor sample could facilitate a comprehensive understanding of its clonal architecture and further provide valuable reference in clinical diagnosis and treatment. However, the sequencing data is an admixture of reads sampled from different clonal haplotypes, which complicates the computational problem by exponentially increasing the solution-space and leads the existing algorithms to an unacceptable time-/space- complexity. In addition, the evolutionary process among clonal haplotypes further weakens those algorithms by bringing indistinguishable candidate solutions.ResultsTo improve the algorithmic performance of phasing clonal haplotypes, in this article, we propose MixSubHap, which is a graph-based computational pipeline working on cancer sequencing data. To reduce the computation complexity, MixSubHap adopts three bounding strategies to limit the solution space and filter out false positive candidates. It first estimates the global clonal structure by clustering the variant allelic frequencies on sampled point mutations. This offers a priori on the number of clonal haplotypes when copy-number variations are not considered. Then, it utilizes a greedy extension algorithm to approximately find the longest linkage of the locally assembled contigs. Finally, it incorporates a read-depth stripping algorithm to filter out false linkages according to the posterior estimation of tumor purity and the estimated percentage of each sub-clone in the sample. A series of experiments are conducted to verify the performance of the proposed pipeline.ConclusionsThe results demonstrate that MixSubHap is able to identify about 90% on average of the preset clonal haplotypes under different simulation configurations. Especially, MixSubHap is robust when decreasing the mutation rates, in which cases the longest assembled contig could reach to 10kbps, while the accuracy of assigning a mutation to its haplotype still keeps more than 60% on average. MixSubHap is considered as a practical algorithm to reconstruct clonal haplotypes from cancer sequencing data. The source codes have been uploaded and maintained at https://github.com/YixuanWang1120/MixSubHap for academic use only.
引用
收藏
页数:12
相关论文
共 46 条
  • [1] Aguiar D, 2014, BIOCOMPUT-PAC SYM, P3
  • [2] Haplotype assembly in polyploid genomes and identical by descent shared tracts
    Aguiar, Derek
    Istrail, Sorin
    [J]. BIOINFORMATICS, 2013, 29 (13) : 352 - 360
  • [3] HapCUT: an efficient and accurate algorithm for the haplotype assembly problem
    Bansal, Vikas
    Bafna, Vineet
    [J]. BIOINFORMATICS, 2008, 24 (16) : I153 - I159
  • [4] Cancer Evolution: Mathematical Models and Computational Inference
    Beerenwinkel, Niko
    Schwarz, Roland F.
    Gerstung, Moritz
    Markowetz, Florian
    [J]. SYSTEMATIC BIOLOGY, 2015, 64 (01) : E1 - E25
  • [5] HapTree: A Novel Bayesian Framework for Single Individual Polyplotyping Using NGS Data
    Berger, Emily
    Yorukoglu, Deniz
    Peng, Jian
    Berger, Bonnie
    [J]. PLOS COMPUTATIONAL BIOLOGY, 2014, 10 (03)
  • [6] Haplotype phasing: existing methods and new developments
    Browning, Sharon R.
    Browning, Brian L.
    [J]. NATURE REVIEWS GENETICS, 2011, 12 (10) : 703 - 714
  • [7] SDhaP: haplotype assembly for diploids and polyploids via semi-definite programming
    Das, Shreepriya
    Vikalo, Haris
    [J]. BMC GENOMICS, 2015, 16
  • [8] PhyloWGS: Reconstructing subclonal composition and evolution from whole-genome sequencing of tumors
    Deshwar, Amit G.
    Vembu, Shankar
    Yung, Christina K.
    Jang, Gun Ho
    Stein, Lincoln
    Morris, Quaid
    [J]. GENOME BIOLOGY, 2015, 16
  • [9] Time-aware cloud service recommendation using similarity-enhanced collaborative filtering and ARIMA model
    Ding, Shuai
    Li, Yeqing
    Wu, Desheng
    Zhang, Youtao
    Yang, Shanlin
    [J]. DECISION SUPPORT SYSTEMS, 2018, 107 : 103 - 115
  • [10] Deconvolution and phylogeny inference of structural variations in tumor genomic samples
    Eaton, Jesse
    Wang, Jingyi
    Schwartz, Russell
    [J]. BIOINFORMATICS, 2018, 34 (13) : 357 - 365