Causalcall: Nanopore Basecalling Using a Temporal Convolutional Network

被引:38
作者
Zeng, Jingwen [1 ]
Cai, Hongmin [1 ]
Peng, Hong [1 ]
Wang, Haiyan [1 ]
Zhang, Yue [2 ]
Akutsu, Tatsuya [3 ]
机构
[1] South China Univ Technol, Sch Comp Sci & Engn, Guangzhou, Peoples R China
[2] Guangdong Plytechn Normal Univ, Sch Comp Sci, Guangzhou, Peoples R China
[3] Kyoto Univ, Inst Chem Res, Bioinformat Ctr, Kyoto, Japan
基金
中国国家自然科学基金;
关键词
nanopore sequencing; basecalling; deep neural network; temporal convolution; performance comparison; assembly; GENOME;
D O I
10.3389/fgene.2019.01332
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
Nanopore sequencing is promising because of its long read length and high speed. During sequencing, a strand of DNA/RNA passes through a biological nanopore, which causes the current in the pore to fluctuate. During basecalling, context-dependent current measurements are translated into the base sequence of the DNA/RNA strand. Accurate and fast basecalling is vital for downstream analyses such as genome assembly and detecting single-nucleotide polymorphisms and genomic structural variants. However, owing to the various changes in DNA/RNA molecules, noise during sequencing, and limitations of basecalling methods, accurate basecalling remains a challenge. In this paper, we propose Causalcall, which uses an end-to-end temporal convolution-based deep learning model for accurate and fast nanopore basecalling. Developed on a temporal convolutional network (TCN) and a connectionist temporal classification decoder, Causalcall directly identifies base sequences of varying lengths from current measurements in long time series. In contrast to the basecalling models using recurrent neural networks (RNNs), the convolution-based model of Causalcall can speed up basecalling by matrix computation. Experiments on multiple species have demonstrated the great potential of the TCN-based model to improve basecalling accuracy and speed when compared to an RNN-based model. Besides, experiments on genome assembly indicate the utility of Causalcall in reference-based genome assembly.
引用
收藏
页数:11
相关论文
共 24 条
  • [1] [Anonymous], 2016, SSW
  • [2] [Anonymous], 2017, bioRxiv, page, DOI DOI 10.1101/133058
  • [3] Bai S., 2018, ARXIV
  • [4] DeepNano: Deep recurrent neural networks for base calling in MinION nanopore reads
    Boza, Vladimir
    Brejova, Brona
    Vinar, Tomas
    [J]. PLOS ONE, 2017, 12 (06):
  • [5] Dauphin YN, 2017, PR MACH LEARN RES, V70
  • [6] Nanocall: an open source basecaller for Oxford Nanopore sequencing data
    David, Matei
    Dursi, L. J.
    Yao, Delia
    Boutros, Paul C.
    Simpson, Jared T.
    [J]. BIOINFORMATICS, 2017, 33 (01) : 49 - 55
  • [7] Graves A., 2006, ICML P 23 INT C MACH, DOI DOI 10.1145/1143844.1143891
  • [8] Nanopore sequencing and assembly of a human genome with ultra-long reads
    Jain, Miten
    Koren, Sergey
    Miga, Karen H.
    Quick, Josh
    Rand, Arthur C.
    Sasani, Thomas A.
    Tyson, John R.
    Beggs, Andrew D.
    Dilthey, Alexander T.
    Fiddes, Ian T.
    Malla, Sunir
    Marriott, Hannah
    Nieto, Tom
    O'Grady, Justin
    Olsen, Hugh E.
    Pedersen, Brent S.
    Rhie, Arang
    Richardson, Hollian
    Quinlan, Aaron R.
    Snutch, Terrance P.
    Tee, Louise
    Paten, Benedict
    Phillippy, Adam M.
    Simpson, Jared T.
    Loman, Nicholas J.
    Loose, Matthew
    [J]. NATURE BIOTECHNOLOGY, 2018, 36 (04) : 338 - +
  • [9] Versatile and open software for comparing large genomes
    Kurtz, S
    Phillippy, A
    Delcher, AL
    Smoot, M
    Shumway, M
    Antonescu, C
    Salzberg, SL
    [J]. GENOME BIOLOGY, 2004, 5 (02)
  • [10] A world of opportunities with nanopore sequencing
    Leggett, Richard M.
    Clark, Matthew D.
    [J]. JOURNAL OF EXPERIMENTAL BOTANY, 2017, 68 (20) : 5419 - 5429