Multi-Layer and Recursive Neural Networks for Metagenomic Classification

被引:64
作者
Ditzler, Gregory [1 ]
Polikar, Robi [2 ]
Rosen, Gail [1 ]
机构
[1] Drexel Univ, Dept Elect & Comp Engn, Philadelphia, PA 19104 USA
[2] Rowan Univ, Dept Elect & Comp Engn, Glassboro, NJ 08028 USA
基金
美国国家科学基金会;
关键词
Comparative metagenomics; metagenomics; microbiome; neural networks; BACTERIAL; COMMUNITIES; ALGORITHM; PROTEIN; SERVER;
D O I
10.1109/TNB.2015.2461219
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Recent advances in machine learning, specifically in deep learning with neural networks, has made a profound impact on fields such as natural language processing, image classification, and language modeling; however, feasibility and potential benefits of the approaches to metagenomic data analysis has been largely under-explored. Deep learning exploits many layers of learning nonlinear feature representations, typically in an unsupervised fashion, and recent results have shown outstanding generalization performance on previously unseen data. Furthermore, some deep learning methods can also represent the structure in a data set. Consequently, deep learning and neural networks may prove to be an appropriate approach for metagenomic data. To determine whether such approaches are indeed appropriate for metagenomics, we experiment with two deep learning methods: i) a deep belief network, and ii) a recursive neural network, the latter of which provides a tree representing the structure of the data. We compare these approaches to the standard multi-layer perceptron, which has been well-established in the machine learning community as a powerful prediction algorithm, though its presence is largely missing in metagenomics literature. We find that traditional neural networks can be quite powerful classifiers on metagenomic data compared to baseline methods, such as random forests. On the other hand, while the deep learning approaches did not result in improvements to the classification accuracy, they do provide the ability to learn hierarchical representations of a data set that standard classification methods do not allow. Our goal in this effort is not to determine the best algorithm in terms accuracy-as that depends on the specific application-but rather to highlight the benefits and drawbacks of each of the approach we discuss and provide insight on how they can be improved for predictive metagenomic analysis.
引用
收藏
页码:608 / 616
页数:9
相关论文
共 44 条
[1]  
[Anonymous], 2011, P 28 INT C MACHINE L
[2]  
Arel I., 2009, ADV NEURAL INF PROCE
[3]   The RAST server: Rapid annotations using subsystems technology [J].
Aziz, Ramy K. ;
Bartels, Daniela ;
Best, Aaron A. ;
DeJongh, Matthew ;
Disz, Terrence ;
Edwards, Robert A. ;
Formsma, Kevin ;
Gerdes, Svetlana ;
Glass, Elizabeth M. ;
Kubal, Michael ;
Meyer, Folker ;
Olsen, Gary J. ;
Olson, Robert ;
Osterman, Andrei L. ;
Overbeek, Ross A. ;
McNeil, Leslie K. ;
Paarmann, Daniel ;
Paczian, Tobias ;
Parrello, Bruce ;
Pusch, Gordon D. ;
Reich, Claudia ;
Stevens, Rick ;
Vassieva, Olga ;
Vonstein, Veronika ;
Wilke, Andreas ;
Zagnitko, Olga .
BMC GENOMICS, 2008, 9 (1)
[4]  
Bengio Y., 2006, ADV NEURAL INF PROCE
[5]  
Bengio Y., 2009, FDN TRENDS MACH LEAR, V2
[6]   Spatial variability in airborne bacterial communities across land-use types and their relationship to the bacterial communities of potential source environments [J].
Bowers, Robert M. ;
McLetchie, Shawna ;
Knight, Rob ;
Fierer, Noah .
ISME JOURNAL, 2011, 5 (04) :601-612
[7]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[8]   A LIMITED MEMORY ALGORITHM FOR BOUND CONSTRAINED OPTIMIZATION [J].
BYRD, RH ;
LU, PH ;
NOCEDAL, J ;
ZHU, CY .
SIAM JOURNAL ON SCIENTIFIC COMPUTING, 1995, 16 (05) :1190-1208
[9]   Moving pictures of the human microbiome [J].
Caporaso, J. Gregory ;
Lauber, Christian L. ;
Costello, Elizabeth K. ;
Berg-Lyons, Donna ;
Gonzalez, Antonio ;
Stombaugh, Jesse ;
Knights, Dan ;
Gajer, Pawel ;
Ravel, Jacques ;
Fierer, Noah ;
Gordon, Jeffrey I. ;
Knight, Rob .
GENOME BIOLOGY, 2011, 12 (05)
[10]   The Ribosomal Database Project: improved alignments and new tools for rRNA analysis [J].
Cole, J. R. ;
Wang, Q. ;
Cardenas, E. ;
Fish, J. ;
Chai, B. ;
Farris, R. J. ;
Kulam-Syed-Mohideen, A. S. ;
McGarrell, D. M. ;
Marsh, T. ;
Garrity, G. M. ;
Tiedje, J. M. .
NUCLEIC ACIDS RESEARCH, 2009, 37 :D141-D145