MetaVelvet-DL: a MetaVelvet deep learning extension for de novo metagenome assembly

被引:14
作者
Liang, Kuo-ching [1 ]
Sakakibara, Yasubumi [1 ]
机构
[1] Keio Univ, Dept Biosci & Informat, Kohoku Ku, 3-14-1 Hiyoshi, Yokohama, Kanagawa 2238522, Japan
关键词
Metagenome analysis; de novo assembly; Deep learning; de Bruijn graph; Long short-term memory; Convolutional neural network; LONG-RANGE CORRELATIONS;
D O I
10.1186/s12859-020-03737-6
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background The increasing use of whole metagenome sequencing has spurred the need to improve de novo assemblers to facilitate the discovery of unknown species and the analysis of their genomic functions. MetaVelvet-SL is a short-read de novo metagenome assembler that partitions a multi-species de Bruijn graph into single-species sub-graphs. This study aimed to improve the performance of MetaVelvet-SL by using a deep learning-based model to predict the partition nodes in a multi-species de Bruijn graph. Results This study showed that the recent advances in deep learning offer the opportunity to better exploit sequence information and differentiate genomes of different species in a metagenomic sample. We developed an extension to MetaVelvet-SL, which we named MetaVelvet-DL, that builds an end-to-end architecture using Convolutional Neural Network and Long Short-Term Memory units. The deep learning model in MetaVelvet-DL can more accurately predict how to partition a de Bruijn graph than the Support Vector Machine-based model in MetaVelvet-SL can. Assembly of the Critical Assessment of Metagenome Interpretation (CAMI) dataset showed that after removing chimeric assemblies, MetaVelvet-DL produced longer single-species contigs, with less misassembled contigs than MetaVelvet-SL did. Conclusions MetaVelvet-DL provides more accurate de novo assemblies of whole metagenome data. The authors believe that this improvement can help in furthering the understanding of microbiomes by providing a more accurate description of the metagenomic samples under analysis.
引用
收藏
页数:21
相关论文
共 29 条
[1]  
Afiahayati S., 2015, DNA RES, V22, P69, DOI DOI 10.1093/dnares/dsu041
[2]   Long-range periodic patterns in microbial genomes indicate significant multi-scale chromosomal organization [J].
Allen, Timothy E. ;
Price, Nathan D. ;
Joyce, Andrew R. ;
Palsson, Bernhard O. .
PLOS COMPUTATIONAL BIOLOGY, 2006, 2 (01) :13-21
[3]   CHARACTERIZING LONG-RANGE CORRELATIONS IN DNA-SEQUENCES FROM WAVELET ANALYSIS [J].
ARNEODO, A ;
BACRY, E ;
GRAVES, PV ;
MUZY, JF .
PHYSICAL REVIEW LETTERS, 1995, 74 (16) :3293-3296
[4]   SPAdes: A New Genome Assembly Algorithm and Its Applications to Single-Cell Sequencing [J].
Bankevich, Anton ;
Nurk, Sergey ;
Antipov, Dmitry ;
Gurevich, Alexey A. ;
Dvorkin, Mikhail ;
Kulikov, Alexander S. ;
Lesin, Valery M. ;
Nikolenko, Sergey I. ;
Son Pham ;
Prjibelski, Andrey D. ;
Pyshkin, Alexey V. ;
Sirotkin, Alexander V. ;
Vyahhi, Nikolay ;
Tesler, Glenn ;
Alekseyev, Max A. ;
Pevzner, Pavel A. .
JOURNAL OF COMPUTATIONAL BIOLOGY, 2012, 19 (05) :455-477
[5]   Representation Learning: A Review and New Perspectives [J].
Bengio, Yoshua ;
Courville, Aaron ;
Vincent, Pascal .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2013, 35 (08) :1798-1828
[6]  
Bowe Alexander, 2012, Algorithms in Bioinformatics. Proceedings of the12th International Workshop, WABI 2012, P225, DOI 10.1007/978-3-642-33122-0_18
[7]   MinION™ nanopore sequencing of environmental metagenomes: a synthetic approach [J].
Brown, Bonnie L. ;
Watson, Mick ;
Minot, Samuel S. ;
Rivera, Maria C. ;
Franklin, Rima B. .
GIGASCIENCE, 2017, 6 (03)
[8]   Improved metagenome assemblies and taxonomic binning using long-read circular consensus sequence data [J].
Frank, J. A. ;
Pan, Y. ;
Tooming-Klunderud, A. ;
Eijsink, V. G. H. ;
McHardy, A. C. ;
Nederbragt, A. J. ;
Pope, P. B. .
SCIENTIFIC REPORTS, 2016, 6
[9]   Framewise phoneme classification with bidirectional LSTM and other neural network architectures [J].
Graves, A ;
Schmidhuber, J .
NEURAL NETWORKS, 2005, 18 (5-6) :602-610
[10]  
Graves A, 2013, INT CONF ACOUST SPEE, P6645, DOI 10.1109/ICASSP.2013.6638947