GOexpress: an R/Bioconductor package for the identification and visualisation of robust gene ontology signatures through supervised learning of gene expression data

被引:22
|
作者
Rue-Albrecht, Kevin [1 ,2 ]
McGettigan, Paul A. [1 ,3 ]
Hernandez, Belinda [4 ,7 ]
Nalpas, Nicolas C. [1 ,5 ]
Magee, David A. [1 ]
Parnell, Andrew C. [4 ]
Gordon, Stephen V. [6 ,7 ]
MacHugh, David E. [1 ,7 ]
机构
[1] Natl Univ Ireland Univ Coll Dublin, UCD Sch Agr & Food Sci, Anim Genom Lab, Dublin 4, Ireland
[2] Univ London Imperial Coll Sci Technol & Med, Ctr Pharmacol & Therapeut, Div Expt Med, Hammersmith Hosp, London W12 0NN, England
[3] Novartis Pharmaceut, Elm Pk Business Campus,Merrion Rd, Dublin 4, Ireland
[4] Natl Univ Ireland Univ Coll Dublin, UCD Sch Math & Stat, Insight Ctr Data Analyt, Dublin 4, Ireland
[5] Univ Tubingen, Proteome Ctr Tubingen, Interfac Inst Cell Biol, Morgenstelle 15, D-72076 Tubingen, Germany
[6] Natl Univ Ireland Univ Coll Dublin, UCD Sch Vet Med, Dublin 4, Ireland
[7] Natl Univ Ireland Univ Coll Dublin, UCD Conway Inst Biomol & Biomed Res, Dublin 4, Ireland
来源
BMC BIOINFORMATICS | 2016年 / 17卷
基金
英国惠康基金; 爱尔兰科学基金会;
关键词
Gene expression; Gene ontology; Supervised learning; Classification; Microarray; RNA-sequencing; Functional genomics; RNA-SEQ DATA; DIFFERENTIAL EXPRESSION; MACROPHAGE RESPONSE; RANDOM FOREST; BIOCONDUCTOR PACKAGE; GENOMIC ANALYSIS; TOOL; INFECTION; SELECTION; PATHWAYS;
D O I
10.1186/s12859-016-0971-3
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Identification of gene expression profiles that differentiate experimental groups is critical for discovery and analysis of key molecular pathways and also for selection of robust diagnostic or prognostic biomarkers. While integration of differential expression statistics has been used to refine gene set enrichment analyses, such approaches are typically limited to single gene lists resulting from simple two-group comparisons or time-series analyses. In contrast, functional class scoring and machine learning approaches provide powerful alternative methods to leverage molecular measurements for pathway analyses, and to compare continuous and multi-level categorical factors. Results: We introduce GOexpress, a software package for scoring and summarising the capacity of gene ontology features to simultaneously classify samples from multiple experimental groups. GOexpress integrates normalised gene expression data (e.g., from microarray and RNA-seq experiments) and phenotypic information of individual samples with gene ontology annotations to derive a ranking of genes and gene ontology terms using a supervised learning approach. The default random forest algorithm allows interactions between all experimental factors, and competitive scoring of expressed genes to evaluate their relative importance in classifying predefined groups of samples. Conclusions: GOexpress enables rapid identification and visualisation of ontology-related gene panels that robustly classify groups of samples and supports both categorical (e.g., infection status, treatment) and continuous (e.g., time-series, drug concentrations) experimental factors. The use of standard Bioconductor extension packages and publicly available gene ontology annotations facilitates straightforward integration of GOexpress within existing computational biology pipelines.
引用
收藏
页数:12
相关论文
共 36 条
  • [31] Single_cell_GRN: gene regulatory network identification based on supervised learning method and Single-cell RNA-seq data
    Bin Yang
    Wenzheng Bao
    Baitong Chen
    Dan Song
    BioData Mining, 15
  • [32] A Toolbox for Functional Analysis and the Systematic Identification of Diagnostic and Prognostic Gene Expression Signatures Combining Meta-Analysis and Machine Learning
    Vey, Johannes
    Kapsner, Lorenz A.
    Fuchs, Maximilian
    Unberath, Philipp
    Veronesi, Giulia
    Kunz, Meik
    CANCERS, 2019, 11 (10)
  • [33] Logic Learning Machine and standard supervised methods for Hodgkin's lymphoma prognosis using gene expression data and clinical variables
    Parodi, Stefano
    Manneschi, Chiara
    Verda, Damiano
    Ferrari, Enrico
    Muselli, Marco
    HEALTH INFORMATICS JOURNAL, 2018, 24 (01) : 54 - 65
  • [34] AFFECT: an R package for accelerated functional failure time model with error-contaminated survival times and applications to gene expression data
    Chen, Li-Pang
    Huang, Hsiao-Ting
    BMC BIOINFORMATICS, 2024, 25 (01):
  • [35] Robust biomarker screening from gene expression data by stable machine learning-recursive feature elimination methods
    Li, Lingyu
    Ching, Wai-Ki
    Liu, Zhi-Ping
    COMPUTATIONAL BIOLOGY AND CHEMISTRY, 2022, 100
  • [36] Identification of gene co-expression modules from zebrafish brain data: Applications in psychiatry illustrated through alcohol-related traits
    Al-Soufi, Laila
    Arana, Alvaro J.
    Facal, Fernando
    Florez, Gerardo
    Vazquez, Fernando L.
    Arrojo, Manuel
    Sanchez, Laura
    Costas, Javier
    PROGRESS IN NEURO-PSYCHOPHARMACOLOGY & BIOLOGICAL PSYCHIATRY, 2024, 135