NanoCaller for accurate detection of SNPs and indels in difficult-to-map regions from long-read sequencing by haplotype-aware deep neural networks

被引:42
作者
Ahsan, Mian Umair [1 ]
Liu, Qian [1 ]
Fang, Li [1 ]
Wang, Kai [1 ,2 ]
机构
[1] Childrens Hosp Philadelphia, Raymond G Perelman Ctr Cellular & Mol Therapeut, Philadelphia, PA 19104 USA
[2] Univ Penn, Perelman Sch Med, Dept Pathol & Lab Med, Philadelphia, PA 19104 USA
关键词
Variant calling; Long-range haplotype; Deep learning; Difficult-to-map regions; HUMAN GENOME;
D O I
10.1186/s13059-021-02472-2
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Long-read sequencing enables variant detection in genomic regions that are considered difficult-to-map by short-read sequencing. To fully exploit the benefits of longer reads, here we present a deep learning method NanoCaller, which detects SNPs using long-range haplotype information, then phases long reads with called SNPs and calls indels with local realignment. Evaluation on 8 human genomes demonstrates that NanoCaller generally achieves better performance than competing approaches. We experimentally validate 41 novel variants in a widely used benchmarking genome, which could not be reliably detected previously. In summary, NanoCaller facilitates the discovery of novel variants in complex genomic regions from long-read sequencing.
引用
收藏
页数:33
相关论文
共 48 条
  • [1] Ahsan MU, 2021, **DATA OBJECT**, DOI 10.5281/zenodo.5176764
  • [2] Single-Molecule Sequencing: Towards Clinical Applications
    Ameur, Adam
    Kloosterman, Wigard P.
    Hestand, Matthew S.
    [J]. TRENDS IN BIOTECHNOLOGY, 2019, 37 (01) : 72 - 85
  • [3] [Anonymous], arXiv, DOI 10.48550/arXiv.1207.3907
  • [4] [Anonymous], 2015, GIAB HG001 PACBIO CL
  • [5] The potential and challenges of nanopore sequencing
    Branton, Daniel
    Deamer, David W.
    Marziali, Andre
    Bayley, Hagan
    Benner, Steven A.
    Butler, Thomas
    Di Ventra, Massimiliano
    Garaj, Slaven
    Hibbs, Andrew
    Huang, Xiaohua
    Jovanovich, Stevan B.
    Krstic, Predrag S.
    Lindsay, Stuart
    Ling, Xinsheng Sean
    Mastrangelo, Carlos H.
    Meller, Amit
    Oliver, John S.
    Pershin, Yuriy V.
    Ramsey, J. Michael
    Riehn, Robert
    Soni, Gautam V.
    Tabard-Cossa, Vincent
    Wanunu, Meni
    Wiggin, Matthew
    Schloss, Jeffery A.
    [J]. NATURE BIOTECHNOLOGY, 2008, 26 (10) : 1146 - 1153
  • [6] Comprehensive evaluation and characterisation of short read general-purpose structural variant calling software
    Cameron, Daniel L.
    Di Stefano, Leon
    Papenfuss, Anthony T.
    [J]. NATURE COMMUNICATIONS, 2019, 10 (1)
  • [7] An ethnically relevant consensus Korean reference genome is a step towards personal reference genomes
    Cho, Yun Sung
    Kim, Hyunho
    Kim, Hak-Min
    Jho, Sungwoong
    Jun, JeHoon
    Lee, Yong Joo
    Chae, Kyun Shik
    Kim, Chang Geun
    Kim, Sangsoo
    Eriksson, Anders
    Edwards, Jeremy S.
    Lee, Semin
    Kim, Byung Chul
    Manica, Andrea
    Oh, Tae-Kwang
    Church, George M.
    Bhak, Jong
    [J]. NATURE COMMUNICATIONS, 2016, 7
  • [8] Cleary J. G., 2015, BIORXIV, DOI [DOI 10.1101/023754, 10.1101/023754]
  • [9] Longshot enables accurate variant calling in diploid genomes from single-molecule long read sequencing
    Edge, Peter
    Bansal, Vikas
    [J]. NATURE COMMUNICATIONS, 2019, 10 (1)
  • [10] HapCUT2: robust and accurate haplotype assembly for diverse sequencing technologies
    Edge, Peter
    Bafna, Vineet
    Bansal, Vikas
    [J]. GENOME RESEARCH, 2017, 27 (05) : 801 - 812