GOexpress: an R/Bioconductor package for the identification and visualisation of robust gene ontology signatures through supervised learning of gene expression data

被引:22
|
作者
Rue-Albrecht, Kevin [1 ,2 ]
McGettigan, Paul A. [1 ,3 ]
Hernandez, Belinda [4 ,7 ]
Nalpas, Nicolas C. [1 ,5 ]
Magee, David A. [1 ]
Parnell, Andrew C. [4 ]
Gordon, Stephen V. [6 ,7 ]
MacHugh, David E. [1 ,7 ]
机构
[1] Natl Univ Ireland Univ Coll Dublin, UCD Sch Agr & Food Sci, Anim Genom Lab, Dublin 4, Ireland
[2] Univ London Imperial Coll Sci Technol & Med, Ctr Pharmacol & Therapeut, Div Expt Med, Hammersmith Hosp, London W12 0NN, England
[3] Novartis Pharmaceut, Elm Pk Business Campus,Merrion Rd, Dublin 4, Ireland
[4] Natl Univ Ireland Univ Coll Dublin, UCD Sch Math & Stat, Insight Ctr Data Analyt, Dublin 4, Ireland
[5] Univ Tubingen, Proteome Ctr Tubingen, Interfac Inst Cell Biol, Morgenstelle 15, D-72076 Tubingen, Germany
[6] Natl Univ Ireland Univ Coll Dublin, UCD Sch Vet Med, Dublin 4, Ireland
[7] Natl Univ Ireland Univ Coll Dublin, UCD Conway Inst Biomol & Biomed Res, Dublin 4, Ireland
来源
BMC BIOINFORMATICS | 2016年 / 17卷
基金
英国惠康基金; 爱尔兰科学基金会;
关键词
Gene expression; Gene ontology; Supervised learning; Classification; Microarray; RNA-sequencing; Functional genomics; RNA-SEQ DATA; DIFFERENTIAL EXPRESSION; MACROPHAGE RESPONSE; RANDOM FOREST; BIOCONDUCTOR PACKAGE; GENOMIC ANALYSIS; TOOL; INFECTION; SELECTION; PATHWAYS;
D O I
10.1186/s12859-016-0971-3
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Identification of gene expression profiles that differentiate experimental groups is critical for discovery and analysis of key molecular pathways and also for selection of robust diagnostic or prognostic biomarkers. While integration of differential expression statistics has been used to refine gene set enrichment analyses, such approaches are typically limited to single gene lists resulting from simple two-group comparisons or time-series analyses. In contrast, functional class scoring and machine learning approaches provide powerful alternative methods to leverage molecular measurements for pathway analyses, and to compare continuous and multi-level categorical factors. Results: We introduce GOexpress, a software package for scoring and summarising the capacity of gene ontology features to simultaneously classify samples from multiple experimental groups. GOexpress integrates normalised gene expression data (e.g., from microarray and RNA-seq experiments) and phenotypic information of individual samples with gene ontology annotations to derive a ranking of genes and gene ontology terms using a supervised learning approach. The default random forest algorithm allows interactions between all experimental factors, and competitive scoring of expressed genes to evaluate their relative importance in classifying predefined groups of samples. Conclusions: GOexpress enables rapid identification and visualisation of ontology-related gene panels that robustly classify groups of samples and supports both categorical (e.g., infection status, treatment) and continuous (e.g., time-series, drug concentrations) experimental factors. The use of standard Bioconductor extension packages and publicly available gene ontology annotations facilitates straightforward integration of GOexpress within existing computational biology pipelines.
引用
收藏
页数:12
相关论文
共 36 条
  • [21] NPA: an R package for computing network perturbation amplitudes using gene expression data and two-layer networks
    Martin, Florian
    Gubian, Sylvain
    Talikka, Marja
    Hoeng, Julia
    Peitsch, Manuel C.
    BMC BIOINFORMATICS, 2019, 20 (01)
  • [22] NPA: an R package for computing network perturbation amplitudes using gene expression data and two-layer networks
    Florian Martin
    Sylvain Gubian
    Marja Talikka
    Julia Hoeng
    Manuel C. Peitsch
    BMC Bioinformatics, 20
  • [23] Identification of metagenes and their Interactions through Large-scale Analysis of Arabidopsis Gene Expression Data
    Tyler J Wilson
    Liming Lai
    Yuguang Ban
    Steven X Ge
    BMC Genomics, 13
  • [24] Deep Learning-based Identification of Cancer or Normal Tissue using Gene Expression Data
    Ahn, TaeJin
    Goo, Taewan
    Lee, Chan-hee
    Kim, SungMin
    Han, Kyullhee
    Park, Sangick
    Park, Taesung
    PROCEEDINGS 2018 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM), 2018, : 1748 - 1752
  • [25] Gene encoder: a feature selection technique through unsupervised deep learning-based clustering for large gene expression data
    Feras Uzma
    Abdallah Al-Obeidat
    Babar Tubaishat
    Zahid Shah
    Neural Computing and Applications, 2022, 34 : 8309 - 8331
  • [26] Classification models for Invasive Ductal Carcinoma Progression, based on gene expression data-trained supervised machine learning
    Roy, Shikha
    Kumar, Rakesh
    Mittal, Vaibhav
    Gupta, Dinesh
    SCIENTIFIC REPORTS, 2020, 10 (01)
  • [27] Improved class prediction in DNA microarray gene expression data by unsupervised reduction of the dimensionality followed by supervised learning with a perceptron
    Conde, L
    Mateos, A
    Herrero, J
    Dopazo, J
    JOURNAL OF VLSI SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY, 2003, 35 (03): : 245 - 253
  • [28] Analyzing gene expression data for pediatric and adult cancer diagnosis using logic learning machine and standard supervised methods
    Verda, Damiano
    Parodi, Stefano
    Ferrari, Enrico
    Muselli, Marco
    BMC BIOINFORMATICS, 2019, 20 (Suppl 9)
  • [29] Improved Class Prediction in DNA Microarray Gene Expression Data by Unsupervised Reduction of the Dimensionality followed by Supervised Learning with a Perceptron
    Lucía Conde
    Álvaro Mateos
    Javier Herrero
    Joaquín Dopazo
    Journal of VLSI signal processing systems for signal, image and video technology, 2003, 35 : 245 - 253
  • [30] Analyzing gene expression data for pediatric and adult cancer diagnosis using logic learning machine and standard supervised methods
    Damiano Verda
    Stefano Parodi
    Enrico Ferrari
    Marco Muselli
    BMC Bioinformatics, 20