BagGMM: Calling copy number variation by bagging multiple Gaussian mixture models from tumor and matched normal next-generation sequencing data

被引:7
作者
Li, Yaoyao [1 ]
Zhang, Junying [1 ]
Yuan, Xiguo [1 ]
机构
[1] Xidian Univ, Sch Comp Sci & Technol, Xian, Shaanxi, Peoples R China
关键词
Next generation sequencing; Copy number variation; Whole-genome sequencing; Gaussian Mixture Model (GMM); Read depth; STRUCTURAL VARIATION; READ ALIGNMENT; TOOL; IDENTIFICATION; VARIANTS;
D O I
10.1016/j.dsp.2019.01.025
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Copy number variations (CNVs) contribute significantly to human genomic variability, some of which lead to diseases. However, effective detection of CNVs from whole genome next generation sequencing data (NGS) remains challenging. Here, we present BagGMM, a new method to call CNVs using tumor normal matched samples from NGS data. BagGMM extracts read depth ratios of tumor samples to normal samples, divides the genomic sequences into segments by sliding windows to count the average coverage ratio of each segment, filters candidate deletions and duplications based on a coarse criterion of coverage ratio, and then builds Gaussian Mixture Model (GMM) for remaining ratios to identify the remaining ambiguous copy number states after filtration. Bagging multiple GMMs makes false positive calls descent instead of using one GMM, thus enhancing the detection power of BagGMM. Considering the computation speed of GMMs and false positive calls, we employ a segmentation procedure "large window and then small windows", which is also helpful to determine boundary of CNV regions. We apply BagGMM to three simulation datasets and two groups of human whole genome sequencing (WGS) data for breast cancer patients and ovarian cancer patients to identify CNVs, respectively. All performed experiments demonstrate that BagGMM has the capability of robustly identification of CNVs with different sizes and states. The performance of this tool is compared to four peer existing CNV detection methods. BagGMM shows a significant improvement in both sensitivity and specificity for detecting both copy number gains and losses. (C) 2019 Elsevier Inc. All rights reserved.
引用
收藏
页码:90 / 100
页数:11
相关论文
共 41 条
  • [1] CNVnator: An approach to discover, genotype, and characterize typical and atypical CNVs from family and population genome sequencing
    Abyzov, Alexej
    Urban, Alexander E.
    Snyder, Michael
    Gerstein, Mark
    [J]. GENOME RESEARCH, 2011, 21 (06) : 974 - 984
  • [2] Personalized copy number and segmental duplication maps using next-generation sequencing
    Alkan, Can
    Kidd, Jeffrey M.
    Marques-Bonet, Tomas
    Aksay, Gozde
    Antonacci, Francesca
    Hormozdiari, Fereydoun
    Kitzman, Jacob O.
    Baker, Carl
    Malig, Maika
    Mutlu, Onur
    Sahinalp, S. Cenk
    Gibbs, Richard A.
    Eichler, Evan E.
    [J]. NATURE GENETICS, 2009, 41 (10) : 1061 - U29
  • [3] [Anonymous], 2016, 2016 IEEE Symposium on VLSI Circuits (VLSI-Circuits), DOI 10.1109/VLSIC.2016.7573558
  • [4] [Anonymous], SCI FOUND CHINA
  • [5] Systematic pan-cancer analysis of tumour purity
    Aran, Dvir
    Sirota, Marina
    Butte, Atul J.
    [J]. NATURE COMMUNICATIONS, 2015, 6
  • [6] AbsCN-seq: a statistical method to estimate tumor purity, ploidy and absolute copy numbers from next-generation sequencing data
    Bao, Lei
    Pu, Minya
    Messer, Karen
    [J]. BIOINFORMATICS, 2014, 30 (08) : 1056 - 1063
  • [7] cnvHiTSeq: integrative models for high-resolution copy number variation detection and genotyping using population sequencing data
    Bellos, Evangelos
    Johnson, Michael R.
    Coin, Lachlan J. M.
    [J]. GENOME BIOLOGY, 2012, 13 (12): : R120
  • [8] Control-FREEC: a tool for assessing copy number and allelic content using next-generation sequencing data
    Boeva, Valentina
    Popova, Tatiana
    Bleakley, Kevin
    Chiche, Pierre
    Cappo, Julie
    Schleiermacher, Gudrun
    Janoueix-Lerosey, Isabelle
    Delattre, Olivier
    Barillot, Emmanuel
    [J]. BIOINFORMATICS, 2012, 28 (03) : 423 - 425
  • [9] SeqCNV: a novel method for identification of copy number variations in targeted next-generation sequencing data
    Chen, Yong
    Zhao, Li
    Wang, Yi
    Cao, Ming
    Gelowani, Violet
    Xu, Mingchu
    Agrawal, Smriti A.
    Li, Yumei
    Daiger, Stephen P.
    Gibbs, Richard
    Wang, Fei
    Chen, Rui
    [J]. BMC BIOINFORMATICS, 2017, 18
  • [10] High-resolution mapping of copy-number alterations with massively parallel sequencing
    Chiang, Derek Y.
    Getz, Gad
    Jaffe, David B.
    O'Kelly, Michael J. T.
    Zhao, Xiaojun
    Carter, Scott L.
    Russ, Carsten
    Nusbaum, Chad
    Meyerson, Matthew
    Lander, Eric S.
    [J]. NATURE METHODS, 2009, 6 (01) : 99 - 103