Polishing copy number variant calls on exome sequencing data via deep learning

被引:9
作者
Ozden, Furkan [1 ]
Alkan, Can [1 ]
Cicek, A. Ercument [1 ,2 ]
机构
[1] Bilkent Univ, Dept Comp Engn, TR-06800 Ankara, Turkey
[2] Carnegie Mellon Univ, Computat Biol Dept, Pittsburgh, PA 15213 USA
关键词
WHOLE-GENOME;
D O I
10.1101/gr.274845.120
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Accurate and efficient detection of copy number variants (CNVs) is of critical importance owing to their significant association with complex genetic diseases. Although algorithms that use whole-genome sequencing (WGS) data provide stable results with mostly valid statistical assumptions, copy number detection on whole-exome sequencing (WES) data shows comparatively lower accuracy. This is unfortunate as WES data are cost-efficient, compact, and relatively ubiquitous. The bottleneck is primarily due to the noncontiguous nature of the targeted capture: biases in targeted genomic hybridization, GC content, targeting probes, and sample batching during sequencing. Here, we present a novel deep learning model, DECoNT, which uses the matched WES and WGS data, and learns to correct the copy number variations reported by any off-the-shelf WES-based germline CNV caller. We train DECoNT on the 1000 Genomes Project data, and we show that we can efficiently triple the duplication call precision and double the deletion call precision of the state-of-the-art algorithms. We also show that our model consistently improves the performance independent of (1) sequencing technology, (2) exome capture kit, and (3) CNV caller. Using DECoNT as a universal exome CNV call polisher has the potential to improve the reliability of germline CNV detection on WES data sets.
引用
收藏
页码:1170 / 1182
页数:13
相关论文
共 46 条
[1]   CNVnator: An approach to discover, genotype, and characterize typical and atypical CNVs from family and population genome sequencing [J].
Abyzov, Alexej ;
Urban, Alexander E. ;
Snyder, Michael ;
Gerstein, Mark .
GENOME RESEARCH, 2011, 21 (06) :974-984
[2]   Personalized copy number and segmental duplication maps using next-generation sequencing [J].
Alkan, Can ;
Kidd, Jeffrey M. ;
Marques-Bonet, Tomas ;
Aksay, Gozde ;
Antonacci, Francesca ;
Hormozdiari, Fereydoun ;
Kitzman, Jacob O. ;
Baker, Carl ;
Malig, Maika ;
Mutlu, Onur ;
Sahinalp, S. Cenk ;
Gibbs, Richard A. ;
Eichler, Evan E. .
NATURE GENETICS, 2009, 41 (10) :1061-U29
[3]  
[Anonymous], 2010, Nature
[4]  
[Anonymous], 2015, Nature, DOI [DOI 10.1038/NATURE15393, DOI 10.1038/nature15393]
[5]   Whole-genome sequencing is more powerful than whole-exome sequencing for detecting exome variants [J].
Belkadi, Aziz ;
Bolze, Alexandre ;
Itan, Yuval ;
Cobat, Aurelie ;
Vincent, Quentin B. ;
Antipenko, Alexander ;
Shang, Lei ;
Boisson, Bertrand ;
Casanova, Jean-Laurent ;
Abel, Laurent .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2015, 112 (17) :5473-5478
[6]   Control-FREEC: a tool for assessing copy number and allelic content using next-generation sequencing data [J].
Boeva, Valentina ;
Popova, Tatiana ;
Bleakley, Kevin ;
Chiche, Pierre ;
Cappo, Julie ;
Schleiermacher, Gudrun ;
Janoueix-Lerosey, Isabelle ;
Delattre, Olivier ;
Barillot, Emmanuel .
BIOINFORMATICS, 2012, 28 (03) :423-425
[7]  
Byrska-Bishop M, 2021, bioRxiv, DOI [10.1101/2021.02.06.430068, 10.1101/2021.02.06.430068, DOI 10.1101/2021.02.06.430068]
[8]   Multi-platform discovery of haplotype-resolved structural variation in human genomes [J].
Chaisson, Mark J. P. ;
Sanders, Ashley D. ;
Zhao, Xuefang ;
Malhotra, Ankit ;
Porubsky, David ;
Rausch, Tobias ;
Gardner, Eugene J. ;
Rodriguez, Oscar L. ;
Guo, Li ;
Collins, Ryan L. ;
Fan, Xian ;
Wen, Jia ;
Handsaker, Robert E. ;
Fairley, Susan ;
Kronenberg, Zev N. ;
Kong, Xiangmeng ;
Hormozdiari, Fereydoun ;
Lee, Dillon ;
Wenger, Aaron M. ;
Hastie, Alex R. ;
Antaki, Danny ;
Anantharaman, Thomas ;
Audano, Peter A. ;
Brand, Harrison ;
Cantsilieris, Stuart ;
Cao, Han ;
Cerveira, Eliza ;
Chen, Chong ;
Chen, Xintong ;
Chin, Chen-Shan ;
Chong, Zechen ;
Chuang, Nelson T. ;
Lambert, Christine C. ;
Church, Deanna M. ;
Clarke, Laura ;
Farrell, Andrew ;
Flores, Joey ;
Galeev, Timur ;
Gorkin, David U. ;
Gujral, Madhusudan ;
Guryev, Victor ;
Heaton, William Haynes ;
Korlach, Jonas ;
Kumar, Sushant ;
Kwon, Jee Young ;
Lam, Ernest T. ;
Lee, Jong Eun ;
Lee, Joyce ;
Lee, Wan-Ping ;
Lee, Sau Peng .
NATURE COMMUNICATIONS, 2019, 10 (1)
[9]   A copy number variation morbidity map of developmental delay [J].
Cooper, Gregory M. ;
Coe, Bradley P. ;
Girirajan, Santhosh ;
Rosenfeld, Jill A. ;
Vu, Tiffany H. ;
Baker, Carl ;
Williams, Charles ;
Stalker, Heather ;
Hamid, Rizwan ;
Hannig, Vickie ;
Abdel-Hamid, Hoda ;
Bader, Patricia ;
McCracken, Elizabeth ;
Niyazov, Dmitriy ;
Leppig, Kathleen ;
Thiese, Heidi ;
Hummel, Marybeth ;
Alexander, Nora ;
Gorski, Jerome ;
Kussmann, Jennifer ;
Shashi, Vandana ;
Johnson, Krys ;
Rehder, Catherine ;
Ballif, Blake C. ;
Shaffer, Lisa G. ;
Eichler, Evan E. .
NATURE GENETICS, 2011, 43 (09) :838-U44
[10]   Synaptic, transcriptional and chromatin genes disrupted in autism [J].
De Rubeis, Silvia ;
He, Xin ;
Goldberg, Arthur P. ;
Poultney, Christopher S. ;
Samocha, Kaitlin ;
Cicek, A. Ercument ;
Kou, Yan ;
Liu, Li ;
Fromer, Menachem ;
Walker, Susan ;
Singh, Tarjinder ;
Klei, Lambertus ;
Kosmicki, Jack ;
Fu, Shih-Chen ;
Aleksic, Branko ;
Biscaldi, Monica ;
Bolton, Patrick F. ;
Brownfeld, Jessica M. ;
Cai, Jinlu ;
Campbell, Nicholas G. ;
Carracedo, Angel ;
Chahrour, Maria H. ;
Chiocchetti, Andreas G. ;
Coon, Hilary ;
Crawford, Emily L. ;
Crooks, Lucy ;
Curran, Sarah R. ;
Dawson, Geraldine ;
Duketis, Eftichia ;
Fernandez, Bridget A. ;
Gallagher, Louise ;
Geller, Evan ;
Guter, Stephen J. ;
Hill, R. Sean ;
Ionita-Laza, Iuliana ;
Gonzalez, Patricia Jimenez ;
Kilpinen, Helena ;
Klauck, Sabine M. ;
Kolevzon, Alexander ;
Lee, Irene ;
Lei, Jing ;
Lehtimaeki, Terho ;
Lin, Chiao-Feng ;
Ma'ayan, Avi ;
Marshall, Christian R. ;
McInnes, Alison L. ;
Neale, Benjamin ;
Owen, Michael J. ;
Ozaki, Norio ;
Parellada, Mara .
NATURE, 2014, 515 (7526) :209-+