Phylogenetic convolutional neural networks in metagenomics

被引:62
作者
Fioravanti, Diego [1 ,2 ]
Giarratano, Ylenia [3 ]
Maggio, Valerio [1 ]
Agostinelli, Claudio [4 ]
Chierici, Marco [1 ]
Jurman, Giuseppe [1 ]
Furlanello, Cesare [1 ]
机构
[1] FBK, Via Sommarive 18 Povo, I-38123 Trento, Italy
[2] Max Planck Inst Intelligent Syst, Spemannstr 34, D-72076 Tubingen, Germany
[3] Univ Edinburgh, Ctr Med Informat, Usher Inst, 9 Little France Rd, Edinburgh EH16 4UX, Midlothian, Scotland
[4] Univ Trento, Dept Math, Via Sommarive 14 Povo, I-38123 Trento, Italy
来源
BMC BIOINFORMATICS | 2018年 / 19卷
关键词
Metagenomics; Deep learning; Convolutional neural networks; Phylogenetic trees; SELECTION; TOOL;
D O I
10.1186/s12859-018-2033-5
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Convolutional Neural Networks can be effectively used only when data are endowed with an intrinsic concept of neighbourhood in the input space, as is the case of pixels in images. We introduce here Ph-CNN, a novel deep learning architecture for the classification of metagenomics data based on the Convolutional Neural Networks, with the patristic distance defined on the phylogenetic tree being used as the proximity measure. The patristic distance between variables is used together with a sparsified version of MultiDimensional Scaling to embed the phylogenetic tree in a Euclidean space. Results: Ph-CNN is tested with a domain adaptation approach on synthetic data and on a metagenomics collection of gut microbiota of 38 healthy subjects and 222 Inflammatory Bowel Disease patients, divided in 6 subclasses. Classification performance is promising when compared to classical algorithms like Support Vector Machines and Random Forest and a baseline fully connected neural network, e.g. the Multi-Layer Perceptron. Conclusion: Ph-CNN represents a novel deep learning approach for the classification of metagenomics data. Operatively, the algorithm has been implemented as a custom Keras layer taking care of passing to the following convolutional layer not only the data but also the ranked list of neighbourhood of each sample, thus mimicking the case of image data, transparently to the user.
引用
收藏
页数:13
相关论文
共 61 条
  • [1] Aitchison J., 1986, Monographs on Statistics and Applied Probability, DOI [10.1007/978-94-009-4109-0, DOI 10.1007/978-94-009-4109-0]
  • [2] Explaining Diversity in Metagenomic Datasets by Phylogenetic-Based Feature Weighting
    Albanese, Davide
    De Filippo, Carlotta
    Cavalieri, Duccio
    Donati, Claudio
    [J]. PLOS COMPUTATIONAL BIOLOGY, 2015, 11 (03)
  • [3] Viral to metazoan marine plankton nucleotide sequences from the Tara Oceans expedition
    Alberti, Adriana
    Poulain, Julie
    Engelen, Stefan
    Labadie, Karine
    Romac, Sarah
    Ferrera, Isabel
    Albini, Guillaume
    Aury, Jean-Marc
    Belser, Caroline
    Bertrand, Alexis
    Cruaud, Corinne
    Da Silva, Corinne
    Dossat, Carole
    Gavory, Frederick
    Gas, Shahinaz
    Guy, Julie
    Haquelle, Maud
    Jacoby, E'krame
    Jaillon, Olivier
    Lemainque, Arnaud
    Pelletier, Eric
    Samson, Gaelle
    Wessner, Mark
    Acinas, Silvia G.
    Royo-Llonch, Marta
    Cornejo-Castillo, Francisco M.
    Logares, Ramiro
    Fernandez-Gomez, Beatriz
    Bowler, Chris
    Cochrane, Guy
    Amid, Clara
    Ten Hoopen, Petra
    De Vargas, Colomban
    Grimsley, Nigel
    Desgranges, Elodie
    Kandels-Lewis, Stefanie
    Ogata, Hiroyuki
    Poulton, Nicole
    Sieracki, Michael E.
    Stepanauskas, Ramunas
    Sullivan, Matthew B.
    Brum, Jennifer R.
    Duhaime, Melissa B.
    Poulos, Bonnie T.
    Hurwitz, Bonnie L.
    Pesant, Stephane
    Karsenti, Eric
    Wincker, Patrick
    Bork, Peer
    Boss, Emmanuel
    [J]. SCIENTIFIC DATA, 2017, 4
  • [4] Deep learning for computational biology
    Angermueller, Christof
    Parnamaa, Tanel
    Parts, Leopold
    Stegle, Oliver
    [J]. MOLECULAR SYSTEMS BIOLOGY, 2016, 12 (07)
  • [5] [Anonymous], BIORXIV
  • [6] [Anonymous], 2017, Opportunities And Obstacles For Deep Learning In Biology And Medicine, DOI [DOI 10.1101/142760, 10.1101/142760]
  • [7] Arango-Argoty GA, 2017, BIORXIV
  • [8] Assessing the accuracy of prediction algorithms for classification: an overview
    Baldi, P
    Brunak, S
    Chauvin, Y
    Andersen, CAF
    Nielsen, H
    [J]. BIOINFORMATICS, 2000, 16 (05) : 412 - 424
  • [9] Bishop C.M., 1995, Neural networks for pattern recognition
  • [10] QIIME allows analysis of high-throughput community sequencing data
    Caporaso, J. Gregory
    Kuczynski, Justin
    Stombaugh, Jesse
    Bittinger, Kyle
    Bushman, Frederic D.
    Costello, Elizabeth K.
    Fierer, Noah
    Pena, Antonio Gonzalez
    Goodrich, Julia K.
    Gordon, Jeffrey I.
    Huttley, Gavin A.
    Kelley, Scott T.
    Knights, Dan
    Koenig, Jeremy E.
    Ley, Ruth E.
    Lozupone, Catherine A.
    McDonald, Daniel
    Muegge, Brian D.
    Pirrung, Meg
    Reeder, Jens
    Sevinsky, Joel R.
    Tumbaugh, Peter J.
    Walters, William A.
    Widmann, Jeremy
    Yatsunenko, Tanya
    Zaneveld, Jesse
    Knight, Rob
    [J]. NATURE METHODS, 2010, 7 (05) : 335 - 336