Phylogenetic convolutional neural networks in metagenomics

被引：62

作者：

Fioravanti, Diego ^{[1
,2
]}

Giarratano, Ylenia ^{[3
]}

Maggio, Valerio ^{[1
]}

Agostinelli, Claudio ^{[4
]}

Chierici, Marco ^{[1
]}

Jurman, Giuseppe ^{[1
]}

Furlanello, Cesare ^{[1
]}

机构：

[1] FBK, Via Sommarive 18 Povo, I-38123 Trento, Italy

[2] Max Planck Inst Intelligent Syst, Spemannstr 34, D-72076 Tubingen, Germany

[3] Univ Edinburgh, Ctr Med Informat, Usher Inst, 9 Little France Rd, Edinburgh EH16 4UX, Midlothian, Scotland

[4] Univ Trento, Dept Math, Via Sommarive 14 Povo, I-38123 Trento, Italy

来源：

BMC BIOINFORMATICS | 2018年 / 19卷

关键词：

Metagenomics; Deep learning; Convolutional neural networks; Phylogenetic trees; SELECTION; TOOL;

D O I：

10.1186/s12859-018-2033-5

中图分类号：

Q5 [生物化学];

学科分类号：

071010 ; 081704 ;

摘要：

Background: Convolutional Neural Networks can be effectively used only when data are endowed with an intrinsic concept of neighbourhood in the input space, as is the case of pixels in images. We introduce here Ph-CNN, a novel deep learning architecture for the classification of metagenomics data based on the Convolutional Neural Networks, with the patristic distance defined on the phylogenetic tree being used as the proximity measure. The patristic distance between variables is used together with a sparsified version of MultiDimensional Scaling to embed the phylogenetic tree in a Euclidean space. Results: Ph-CNN is tested with a domain adaptation approach on synthetic data and on a metagenomics collection of gut microbiota of 38 healthy subjects and 222 Inflammatory Bowel Disease patients, divided in 6 subclasses. Classification performance is promising when compared to classical algorithms like Support Vector Machines and Random Forest and a baseline fully connected neural network, e.g. the Multi-Layer Perceptron. Conclusion: Ph-CNN represents a novel deep learning approach for the classification of metagenomics data. Operatively, the algorithm has been implemented as a custom Keras layer taking care of passing to the following convolutional layer not only the data but also the ranked list of neighbourhood of each sample, thus mimicking the case of image data, transparently to the user.

引用

页数：13

共 61 条

[1] Aitchison J., 1986, Monographs on Statistics and Applied Probability, DOI [10.1007/978-94-009-4109-0, DOI 10.1007/978-94-009-4109-0]
[2] Explaining Diversity in Metagenomic Datasets by Phylogenetic-Based Feature Weighting
Albanese, Davide
De Filippo, Carlotta
Cavalieri, Duccio
Donati, Claudio
[J]. PLOS COMPUTATIONAL BIOLOGY, 2015, 11 (03)
[3] Viral to metazoan marine plankton nucleotide sequences from the Tara Oceans expedition
Alberti, Adriana
Poulain, Julie
Engelen, Stefan
Labadie, Karine
Romac, Sarah
Ferrera, Isabel
Albini, Guillaume
Aury, Jean-Marc
Belser, Caroline
Bertrand, Alexis
Cruaud, Corinne
Da Silva, Corinne
Dossat, Carole
Gavory, Frederick
Gas, Shahinaz
Guy, Julie
Haquelle, Maud
Jacoby, E'krame
Jaillon, Olivier
Lemainque, Arnaud
Pelletier, Eric
Samson, Gaelle
Wessner, Mark
Acinas, Silvia G.
Royo-Llonch, Marta
Cornejo-Castillo, Francisco M.
Logares, Ramiro
Fernandez-Gomez, Beatriz
Bowler, Chris
Cochrane, Guy
Amid, Clara
Ten Hoopen, Petra
De Vargas, Colomban
Grimsley, Nigel
Desgranges, Elodie
Kandels-Lewis, Stefanie
Ogata, Hiroyuki
Poulton, Nicole
Sieracki, Michael E.
Stepanauskas, Ramunas
Sullivan, Matthew B.
Brum, Jennifer R.
Duhaime, Melissa B.
Poulos, Bonnie T.
Hurwitz, Bonnie L.
Pesant, Stephane
Karsenti, Eric
Wincker, Patrick
Bork, Peer
Boss, Emmanuel
[J]. SCIENTIFIC DATA, 2017, 4
[4] Deep learning for computational biology
Angermueller, Christof
Parnamaa, Tanel
Parts, Leopold
Stegle, Oliver
[J]. MOLECULAR SYSTEMS BIOLOGY, 2016, 12 (07)
[5] [Anonymous], BIORXIV
[6] [Anonymous], 2017, Opportunities And Obstacles For Deep Learning In Biology And Medicine, DOI [DOI 10.1101/142760, 10.1101/142760]
[7] Arango-Argoty GA, 2017, BIORXIV
[8] Assessing the accuracy of prediction algorithms for classification: an overview
Baldi, P
Brunak, S
Chauvin, Y
Andersen, CAF
Nielsen, H
[J]. BIOINFORMATICS, 2000, 16 (05) : 412 - 424
[9] Bishop C.M., 1995, Neural networks for pattern recognition
[10] QIIME allows analysis of high-throughput community sequencing data
Caporaso, J. Gregory
Kuczynski, Justin
Stombaugh, Jesse
Bittinger, Kyle
Bushman, Frederic D.
Costello, Elizabeth K.
Fierer, Noah
Pena, Antonio Gonzalez
Goodrich, Julia K.
Gordon, Jeffrey I.
Huttley, Gavin A.
Kelley, Scott T.
Knights, Dan
Koenig, Jeremy E.
Ley, Ruth E.
Lozupone, Catherine A.
McDonald, Daniel
Muegge, Brian D.
Pirrung, Meg
Reeder, Jens
Sevinsky, Joel R.
Tumbaugh, Peter J.
Walters, William A.
Widmann, Jeremy
Yatsunenko, Tanya
Zaneveld, Jesse
Knight, Rob
[J]. NATURE METHODS, 2010, 7 (05) : 335 - 336

← 1 2 3 4 5 6 7 →