DeNovoCNN: a deep learning approach to de novo variant calling in next generation sequencing data

被引:10
作者
Khazeeva, Gelana [1 ]
Sablauskas, Karolis [2 ]
van der Sanden, Bart [3 ]
Steyaert, Wouter [1 ]
Kwint, Michael [1 ]
Rots, Dmitrijs [3 ]
Hinne, Max [4 ]
van Gerven, Marcel [4 ]
Yntema, Helger [3 ]
Vissers, Lisenka [3 ]
Gilissen, Christian [1 ]
机构
[1] Radboud Univ Nijmegen, Radboud Inst Mol Life Sci, Dept Human Genet, Med Ctr, Geert Grootepl 10, NL-6525 GA Nijmegen, Netherlands
[2] Vilnius Univ, Inst Clin Med, Fac Med, Vilnius, Lithuania
[3] Radboud Univ Nijmegen, Dept Human Genet, Donders Ctr Neurosci, Med Ctr, Geert Grootepl 10, NL-6525 GA Nijmegen, Netherlands
[4] Radboud Univ Nijmegen, Donders Inst Brain Cognit & Behav, Nijmegen, Netherlands
关键词
DISCOVERY; FRAMEWORK;
D O I
10.1093/nar/gkac511
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
De novo mutations (DNMs) are an important cause of genetic disorders. The accurate identification of DNMs from sequencing data is therefore fundamental to rare disease research and diagnostics. Unfortunately, identifying reliable DNMs remains a major challenge due to sequence errors, uneven coverage, and mapping artifacts. Here, we developed a deep convolutional neural network (CNN) DNM caller (DeNovoCNN), that encodes the alignment of sequence reads for a trio as 160 x 164 resolution images. DeNovoCNN was trained on DNMs of 5616 whole exome sequencing (WES) trios achieving total 96.74% recall and 96.55% precision on the test dataset. We find that DeNovoCNN has increased recall/sensitivity and precision compared to existing DNM calling approaches (GATK, DeNovoGear, DeepTrio, Samtools) based on the Genome in a Bottle reference dataset and independent WES and WGS trios. Validations of DNMs based on Sanger and PacBio HiFi sequencing confirm that DeNovoCNN outperforms existing methods. Most importantly, our results suggest that DeNovoCNN is likely robust against different exome sequencing and analyses approaches, thereby allowing the application on other datasets. DeNovoCNN is freely available as a Docker container and can be run on existing alignment (BAM/CRAM) and variant calling (VCF) files from WES and WGS without a need for variant recalling.
引用
收藏
页数:10
相关论文
共 23 条
  • [1] Abadi M, 2016, PROCEEDINGS OF OSDI'16: 12TH USENIX SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION, P265
  • [2] Cleary JG, 2015, BIORXIV, DOI DOI 10.1101/023754
  • [3] A framework for variation discovery and genotyping using next-generation DNA sequencing data
    DePristo, Mark A.
    Banks, Eric
    Poplin, Ryan
    Garimella, Kiran V.
    Maguire, Jared R.
    Hartl, Christopher
    Philippakis, Anthony A.
    del Angel, Guillermo
    Rivas, Manuel A.
    Hanna, Matt
    McKenna, Aaron
    Fennell, Tim J.
    Kernytsky, Andrew M.
    Sivachenko, Andrey Y.
    Cibulskis, Kristian
    Gabriel, Stacey B.
    Altshuler, David
    Daly, Mark J.
    [J]. NATURE GENETICS, 2011, 43 (05) : 491 - +
  • [4] Deep learning: new computational modelling techniques for genomics
    Eraslan, Gokcen
    Avsec, Ziga
    Gagneur, Julien
    Theis, Fabian J.
    [J]. NATURE REVIEWS GENETICS, 2019, 20 (07) : 389 - 403
  • [5] De Novo Mutations Reflect Development and Aging of the Human Germline
    Goldmann, J. M.
    Veltman, J. A.
    Gilissen, C.
    [J]. TRENDS IN GENETICS, 2019, 35 (11) : 828 - 839
  • [6] De novo mutations in congenital heart disease with neurodevelopmental and other congenital anomalies
    Homsy, Jason
    Zaidi, Samir
    Shen, Yufeng
    Ware, James S.
    Samocha, Kaitlin E.
    Karczewski, Konrad J.
    DePalma, Steven R.
    McKean, David
    Wakimoto, Hiroko
    Gorham, Josh
    Jin, Sheng Chih
    Deanfield, John
    Giardini, Alessandro
    Porter, George A., Jr.
    Kim, Richard
    Bilguvar, Kaya
    Lopez-Giraldez, Francesc
    Tikhonova, Irina
    Mane, Shrikant
    Romano-Adesman, Angela
    Qi, Hongjian
    Vardarajan, Badri
    Ma, Lijiang
    Daly, Mark
    Roberts, Amy E.
    Russell, Mark W.
    Mital, Seema
    Newburger, Jane W.
    Gaynor, J. William
    Breitbart, Roger E.
    Iossifov, Ivan
    Ronemus, Michael
    Sanders, Stephan J.
    Kaltman, Jonathan R.
    Seidman, Jonathan G.
    Brueckner, Martina
    Gelb, Bruce D.
    Goldmuntz, Elizabeth
    Lifton, Richard P.
    Seidman, Christine E.
    Chung, Wendy K.
    [J]. SCIENCE, 2015, 350 (6265) : 1262 - 1266
  • [7] Hu J, 2018, PROC CVPR IEEE, P7132, DOI [10.1109/TPAMI.2019.2913372, 10.1109/CVPR.2018.00745]
  • [8] Evidence for 28 genetic disorders discovered by combining healthcare and research data
    Kaplanis, Joanna
    Samocha, Kaitlin E.
    Wiel, Laurens
    Zhang, Zhancheng
    Arvai, Kevin J.
    Eberhardt, Ruth Y.
    Gallone, Giuseppe
    Lelieveld, Stefan H.
    Martin, Hilary C.
    McRae, Jeremy F.
    Short, Patrick J.
    Torene, Rebecca I.
    de Boer, Elke
    Danecek, Petr
    Gardner, Eugene J.
    Huang, Ni
    Lord, Jenny
    Martincorena, Inigo
    Pfundt, Rolph
    Reijnders, Margot R. F.
    Yeung, Alison
    Yntema, Helger G.
    Vissers, Lisenka E. L. M.
    Juusola, Jane
    Wright, Caroline F.
    Brunner, Han G.
    Firth, Helen V.
    FitzPatrick, David R.
    Barrett, Jeffrey C.
    Hurles, Matthew E.
    Gilissen, Christian
    Retterer, Kyle
    [J]. NATURE, 2020, 586 (7831) : 757 - +
  • [9] Kolesnikov A., 2021, BIORXIV, DOI [10.1101/2021.04.05.438434, DOI 10.1101/2021.04.05.438434]
  • [10] Meta-analysis of 2,104 trios provides support for 10 new genes for intellectual disability
    Lelieveld, Stefan H.
    Reijnders, Margot R. F.
    Pfundt, Rolph
    Yntema, Helger G.
    Kamsteeg, Erik-Jan
    de Vries, Petra
    de Vries, Bert B. A.
    Willemsen, Marjolein H.
    Kleefstra, Tjitske
    Lohner, Katharina
    Vreeburg, Maaike
    Stevens, Servi J. C.
    van der Burgt, Ineke
    Bongers, Ernie M. H. F.
    Stegmann, Alexander P. A.
    Rump, Patrick
    Rinne, Tuula
    Nelen, Marcel R.
    Veltman, Joris A.
    Vissers, Lisenka E. L. M.
    Brunner, Han G.
    Gilissen, Christian
    [J]. NATURE NEUROSCIENCE, 2016, 19 (09) : 1194 - 1196