NanoCaller for accurate detection of SNPs and indels in difficult-to-map regions from long-read sequencing by haplotype-aware deep neural networks

被引:48
作者
Ahsan, Mian Umair [1 ]
Liu, Qian [1 ]
Fang, Li [1 ]
Wang, Kai [1 ,2 ]
机构
[1] Childrens Hosp Philadelphia, Raymond G Perelman Ctr Cellular & Mol Therapeut, Philadelphia, PA 19104 USA
[2] Univ Penn, Perelman Sch Med, Dept Pathol & Lab Med, Philadelphia, PA 19104 USA
关键词
Variant calling; Long-range haplotype; Deep learning; Difficult-to-map regions; HUMAN GENOME;
D O I
10.1186/s13059-021-02472-2
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Long-read sequencing enables variant detection in genomic regions that are considered difficult-to-map by short-read sequencing. To fully exploit the benefits of longer reads, here we present a deep learning method NanoCaller, which detects SNPs using long-range haplotype information, then phases long reads with called SNPs and calls indels with local realignment. Evaluation on 8 human genomes demonstrates that NanoCaller generally achieves better performance than competing approaches. We experimentally validate 41 novel variants in a widely used benchmarking genome, which could not be reliably detected previously. In summary, NanoCaller facilitates the discovery of novel variants in complex genomic regions from long-read sequencing.
引用
收藏
页数:33
相关论文
共 48 条
[1]  
Ahsan MU, 2021, **DATA OBJECT**, DOI 10.5281/zenodo.5176764
[2]   Single-Molecule Sequencing: Towards Clinical Applications [J].
Ameur, Adam ;
Kloosterman, Wigard P. ;
Hestand, Matthew S. .
TRENDS IN BIOTECHNOLOGY, 2019, 37 (01) :72-85
[3]  
[Anonymous], 2015, GIAB HG001 PACBIO CL
[4]   The potential and challenges of nanopore sequencing [J].
Branton, Daniel ;
Deamer, David W. ;
Marziali, Andre ;
Bayley, Hagan ;
Benner, Steven A. ;
Butler, Thomas ;
Di Ventra, Massimiliano ;
Garaj, Slaven ;
Hibbs, Andrew ;
Huang, Xiaohua ;
Jovanovich, Stevan B. ;
Krstic, Predrag S. ;
Lindsay, Stuart ;
Ling, Xinsheng Sean ;
Mastrangelo, Carlos H. ;
Meller, Amit ;
Oliver, John S. ;
Pershin, Yuriy V. ;
Ramsey, J. Michael ;
Riehn, Robert ;
Soni, Gautam V. ;
Tabard-Cossa, Vincent ;
Wanunu, Meni ;
Wiggin, Matthew ;
Schloss, Jeffery A. .
NATURE BIOTECHNOLOGY, 2008, 26 (10) :1146-1153
[5]   Comprehensive evaluation and characterisation of short read general-purpose structural variant calling software [J].
Cameron, Daniel L. ;
Di Stefano, Leon ;
Papenfuss, Anthony T. .
NATURE COMMUNICATIONS, 2019, 10 (1)
[6]   An ethnically relevant consensus Korean reference genome is a step towards personal reference genomes [J].
Cho, Yun Sung ;
Kim, Hyunho ;
Kim, Hak-Min ;
Jho, Sungwoong ;
Jun, JeHoon ;
Lee, Yong Joo ;
Chae, Kyun Shik ;
Kim, Chang Geun ;
Kim, Sangsoo ;
Eriksson, Anders ;
Edwards, Jeremy S. ;
Lee, Semin ;
Kim, Byung Chul ;
Manica, Andrea ;
Oh, Tae-Kwang ;
Church, George M. ;
Bhak, Jong .
NATURE COMMUNICATIONS, 2016, 7
[7]  
Cleary JG., 2015, BIORXIV, DOI DOI 10.1101/023754
[8]   Longshot enables accurate variant calling in diploid genomes from single-molecule long read sequencing [J].
Edge, Peter ;
Bansal, Vikas .
NATURE COMMUNICATIONS, 2019, 10 (1)
[9]   HapCUT2: robust and accurate haplotype assembly for diverse sequencing technologies [J].
Edge, Peter ;
Bafna, Vineet ;
Bansal, Vikas .
GENOME RESEARCH, 2017, 27 (05) :801-812
[10]   Real-Time DNA Sequencing from Single Polymerase Molecules [J].
Eid, John ;
Fehr, Adrian ;
Gray, Jeremy ;
Luong, Khai ;
Lyle, John ;
Otto, Geoff ;
Peluso, Paul ;
Rank, David ;
Baybayan, Primo ;
Bettman, Brad ;
Bibillo, Arkadiusz ;
Bjornson, Keith ;
Chaudhuri, Bidhan ;
Christians, Frederick ;
Cicero, Ronald ;
Clark, Sonya ;
Dalal, Ravindra ;
deWinter, Alex ;
Dixon, John ;
Foquet, Mathieu ;
Gaertner, Alfred ;
Hardenbol, Paul ;
Heiner, Cheryl ;
Hester, Kevin ;
Holden, David ;
Kearns, Gregory ;
Kong, Xiangxu ;
Kuse, Ronald ;
Lacroix, Yves ;
Lin, Steven ;
Lundquist, Paul ;
Ma, Congcong ;
Marks, Patrick ;
Maxham, Mark ;
Murphy, Devon ;
Park, Insil ;
Pham, Thang ;
Phillips, Michael ;
Roy, Joy ;
Sebra, Robert ;
Shen, Gene ;
Sorenson, Jon ;
Tomaney, Austin ;
Travers, Kevin ;
Trulson, Mark ;
Vieceli, John ;
Wegener, Jeffrey ;
Wu, Dawn ;
Yang, Alicia ;
Zaccarin, Denis .
SCIENCE, 2009, 323 (5910) :133-138