ADAGE-Based Integration of Publicly Available Pseudomonas aeruginosa Gene Expression Data with Denoising Autoencoders Illuminates Microbe-Host Interactions

被引:79
作者
Tan, Jie [1 ]
Hammond, John H. [2 ]
Hogan, Deborah A. [2 ]
Greene, Casey S. [1 ,3 ]
机构
[1] Geisel Sch Med Dartmouth, Dept Genet, Hanover, NH 03755 USA
[2] Geisel Sch Med Dartmouth, Dept Microbiol & Immunol, Hanover, NH USA
[3] Univ Penn, Dept Syst Pharmacol & Translat Therapeut, Perelman Sch Med, Philadelphia, PA 19104 USA
基金
美国国家卫生研究院;
关键词
genomics; denoising autoencoders; bioinformatics; gene expression; data integration; INDEPENDENT COMPONENT ANALYSIS; BREAST-CANCER; ANR; NETWORK; MICROARRAYS; ONTOLOGY; GROWTH; LINKS; TOOL;
D O I
10.1128/mSystems.00025-15
中图分类号
Q93 [微生物学];
学科分类号
071005 ; 100705 ;
摘要
The increasing number of genome-wide assays of gene expression available from public databases presents opportunities for computational methods that facilitate hypothesis generation and biological interpretation of these data. We present an unsupervised machine learning approach, ADAGE (analysis using denoising autoencoders of gene expression), and apply it to the publicly available gene expression data compendium for Pseudomonas aeruginosa. In this approach, the machine-learned ADAGE model contained 50 nodes which we predicted would correspond to gene expression patterns across the gene expression compendium. While no biological knowledge was used during model construction, cooperonic genes had similar weights across nodes, and genes with similar weights across nodes were significantly more likely to share KEGG pathways. By analyzing newly generated and previously published microarray and transcriptome sequencing data, the ADAGE model identified differences between strains, modeled the cellular response to low oxygen, and predicted the involvement of biological processes based on low-level gene expression differences. ADAGE compared favorably with traditional principal component analysis and independent component analysis approaches in its ability to extract validated patterns, and based on our analyses, we propose that these approaches differ in the types of patterns they preferentially identify. We provide the ADAGE model with analysis of all publicly available P. aeruginosa GeneChip experiments and open source code for use with other species and settings. Extraction of consistent patterns across large-scale collections of genomic data using methods like ADAGE provides the opportunity to identify general principles and biologically important patterns in microbial biology. This approach will be particularly useful in less-well-studied microbial species. IMPORTANCE The quantity and breadth of genome-scale data sets that examine RNA expression in diverse bacterial and eukaryotic species are increasing more rapidly than for curated knowledge. Our ADAGE method integrates such data without requiring gene function, gene pathway, or experiment labeling, making practical its application to any large gene expression compendium. We built a Pseudomonas aeruginosa ADAGE model from a diverse set of publicly available experiments without any prespecified biological knowledge, and this model was accurate and predictive. We provide ADAGE results for the complete P. aeruginosa GeneChip compendium for use by researchers studying P. aeruginosa and source code that facilitates ADAGE's application to other species and data types.
引用
收藏
页数:17
相关论文
共 67 条
[1]   In vitro analysis of tobramycin-treated Pseudomonas aeruginosa Biofilms on cystic fibrosis-derived airway epithelial cells [J].
Anderson, Gregory G. ;
Moreau-Marquis, Sophie ;
Stanton, Bruce A. ;
O'Toole, George A. .
INFECTION AND IMMUNITY, 2008, 76 (04) :1423-1433
[2]   The Pseudomonas aeruginosa Magnesium Transporter MgtE Inhibits Transcription of the Type III Secretion System [J].
Anderson, Gregory G. ;
Yahr, Timothy L. ;
Lovewell, Rustin R. ;
O'Toole, George A. .
INFECTION AND IMMUNITY, 2010, 78 (03) :1239-1249
[3]  
[Anonymous], FASTICA ALGORITHMS P
[4]  
[Anonymous], 2010, P PYTH SCI C
[5]  
[Anonymous], PEERJ PREPRINT ARCH
[6]   Gene Ontology: tool for the unification of biology [J].
Ashburner, M ;
Ball, CA ;
Blake, JA ;
Botstein, D ;
Butler, H ;
Cherry, JM ;
Davis, AP ;
Dolinski, K ;
Dwight, SS ;
Eppig, JT ;
Harris, MA ;
Hill, DP ;
Issel-Tarver, L ;
Kasarskis, A ;
Lewis, S ;
Matese, JC ;
Richardson, JE ;
Ringwald, M ;
Rubin, GM ;
Sherlock, G .
NATURE GENETICS, 2000, 25 (01) :25-29
[7]   Integrated genomic analyses of ovarian carcinoma [J].
Bell, D. ;
Berchuck, A. ;
Birrer, M. ;
Chien, J. ;
Cramer, D. W. ;
Dao, F. ;
Dhir, R. ;
DiSaia, P. ;
Gabra, H. ;
Glenn, P. ;
Godwin, A. K. ;
Gross, J. ;
Hartmann, L. ;
Huang, M. ;
Huntsman, D. G. ;
Iacocca, M. ;
Imielinski, M. ;
Kalloger, S. ;
Karlan, B. Y. ;
Levine, D. A. ;
Mills, G. B. ;
Morrison, C. ;
Mutch, D. ;
Olvera, N. ;
Orsulic, S. ;
Park, K. ;
Petrelli, N. ;
Rabeno, B. ;
Rader, J. S. ;
Sikic, B. I. ;
Smith-McCune, K. ;
Sood, A. K. ;
Bowtell, D. ;
Penny, R. ;
Testa, J. R. ;
Chang, K. ;
Dinh, H. H. ;
Drummond, J. A. ;
Fowler, G. ;
Gunaratne, P. ;
Hawes, A. C. ;
Kovar, C. L. ;
Lewis, L. R. ;
Morgan, M. B. ;
Newsham, I. F. ;
Santibanez, J. ;
Reid, J. G. ;
Trevino, L. R. ;
Wu, Y. -Q. ;
Wang, M. .
NATURE, 2011, 474 (7353) :609-615
[8]   Transcriptional insights into the CD8+ T cell response to infection and memory T cell formation [J].
Best, J. Adam ;
Blair, David A. ;
Knell, Jamie ;
Yang, Edward ;
Mayya, Viveka ;
Doedens, Andrew ;
Dustin, Michael L. ;
Goldrath, Ananda W. .
NATURE IMMUNOLOGY, 2013, 14 (04) :404-412
[9]   Atlas of Gene Expression in the Developing Kidney at Microanatomic Resolution [J].
Brunskill, Eric W. ;
Aronow, Bruce J. ;
Georgas, Kylie ;
Rumballe, Bree ;
Valerius, M. Todd ;
Aronow, Jeremy ;
Kaimal, Vivek ;
Jegga, Anil G. ;
Grimmond, Sean ;
McMahon, Andrew P. ;
Patterson, Larry T. ;
Little, Melissa H. ;
Potter, S. Steven .
DEVELOPMENTAL CELL, 2008, 15 (05) :781-791
[10]   Reference-free transcriptome assembly in non-model animals from next-generation sequencing data [J].
Cahais, V. ;
Gayral, P. ;
Tsagkogeorga, G. ;
Melo-Ferreira, J. ;
Ballenghien, M. ;
Weinert, L. ;
Chiari, Y. ;
Belkhir, K. ;
Ranwez, V. ;
Galtier, N. .
MOLECULAR ECOLOGY RESOURCES, 2012, 12 (05) :834-845