GOexpress: an R/Bioconductor package for the identification and visualisation of robust gene ontology signatures through supervised learning of gene expression data

被引:22
|
作者
Rue-Albrecht, Kevin [1 ,2 ]
McGettigan, Paul A. [1 ,3 ]
Hernandez, Belinda [4 ,7 ]
Nalpas, Nicolas C. [1 ,5 ]
Magee, David A. [1 ]
Parnell, Andrew C. [4 ]
Gordon, Stephen V. [6 ,7 ]
MacHugh, David E. [1 ,7 ]
机构
[1] Natl Univ Ireland Univ Coll Dublin, UCD Sch Agr & Food Sci, Anim Genom Lab, Dublin 4, Ireland
[2] Univ London Imperial Coll Sci Technol & Med, Ctr Pharmacol & Therapeut, Div Expt Med, Hammersmith Hosp, London W12 0NN, England
[3] Novartis Pharmaceut, Elm Pk Business Campus,Merrion Rd, Dublin 4, Ireland
[4] Natl Univ Ireland Univ Coll Dublin, UCD Sch Math & Stat, Insight Ctr Data Analyt, Dublin 4, Ireland
[5] Univ Tubingen, Proteome Ctr Tubingen, Interfac Inst Cell Biol, Morgenstelle 15, D-72076 Tubingen, Germany
[6] Natl Univ Ireland Univ Coll Dublin, UCD Sch Vet Med, Dublin 4, Ireland
[7] Natl Univ Ireland Univ Coll Dublin, UCD Conway Inst Biomol & Biomed Res, Dublin 4, Ireland
来源
BMC BIOINFORMATICS | 2016年 / 17卷
基金
英国惠康基金; 爱尔兰科学基金会;
关键词
Gene expression; Gene ontology; Supervised learning; Classification; Microarray; RNA-sequencing; Functional genomics; RNA-SEQ DATA; DIFFERENTIAL EXPRESSION; MACROPHAGE RESPONSE; RANDOM FOREST; BIOCONDUCTOR PACKAGE; GENOMIC ANALYSIS; TOOL; INFECTION; SELECTION; PATHWAYS;
D O I
10.1186/s12859-016-0971-3
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Identification of gene expression profiles that differentiate experimental groups is critical for discovery and analysis of key molecular pathways and also for selection of robust diagnostic or prognostic biomarkers. While integration of differential expression statistics has been used to refine gene set enrichment analyses, such approaches are typically limited to single gene lists resulting from simple two-group comparisons or time-series analyses. In contrast, functional class scoring and machine learning approaches provide powerful alternative methods to leverage molecular measurements for pathway analyses, and to compare continuous and multi-level categorical factors. Results: We introduce GOexpress, a software package for scoring and summarising the capacity of gene ontology features to simultaneously classify samples from multiple experimental groups. GOexpress integrates normalised gene expression data (e.g., from microarray and RNA-seq experiments) and phenotypic information of individual samples with gene ontology annotations to derive a ranking of genes and gene ontology terms using a supervised learning approach. The default random forest algorithm allows interactions between all experimental factors, and competitive scoring of expressed genes to evaluate their relative importance in classifying predefined groups of samples. Conclusions: GOexpress enables rapid identification and visualisation of ontology-related gene panels that robustly classify groups of samples and supports both categorical (e.g., infection status, treatment) and continuous (e.g., time-series, drug concentrations) experimental factors. The use of standard Bioconductor extension packages and publicly available gene ontology annotations facilitates straightforward integration of GOexpress within existing computational biology pipelines.
引用
收藏
页数:12
相关论文
共 36 条
  • [1] GOexpress: an R/Bioconductor package for the identification and visualisation of robust gene ontology signatures through supervised learning of gene expression data
    Kévin Rue-Albrecht
    Paul A. McGettigan
    Belinda Hernández
    Nicolas C. Nalpas
    David A. Magee
    Andrew C. Parnell
    Stephen V. Gordon
    David E. MacHugh
    BMC Bioinformatics, 17
  • [2] DFP: a Bioconductor package for fuzzy profile identification and gene reduction of microarray data
    Glez-Pena, Daniel
    Alvarez, Rodrigo
    Diaz, Fernando
    Fdez-Riverola, Florentino
    BMC BIOINFORMATICS, 2009, 10
  • [3] DepthTools: an R package for a robust analysis of gene expression data
    Torrente, Aurora
    Lopez-Pintado, Sara
    Romo, Juan
    BMC BIOINFORMATICS, 2013, 14
  • [4] NormExpression: An R Package to Normalize Gene Expression Data Using Evaluated Methods
    Wu, Zhenfeng
    Liu, Weixiang
    Jin, Xiufeng
    Ji, Haishuo
    Wang, Hua
    Glusman, Gustavo
    Robinson, Max
    Liu, Lin
    Ruan, Jishou
    Gao, Shan
    FRONTIERS IN GENETICS, 2019, 10
  • [5] A functional gene module identification algorithm in gene expression data based on genetic algorithm and gene ontology
    Zhang, Yan
    Shi, Weiyu
    Sun, Yeqing
    BMC GENOMICS, 2023, 24 (01)
  • [6] A functional gene module identification algorithm in gene expression data based on genetic algorithm and gene ontology
    Yan Zhang
    Weiyu Shi
    Yeqing Sun
    BMC Genomics, 24
  • [7] tigeR: Tumor immunotherapy gene expression data analysis R package
    Chen, Yihao
    He, Li-Na
    Zhang, Yuanzhe
    Gong, Jingru
    Xu, Shuangbin
    Shu, Yuelong
    Zhang, Di
    Yu, Guangchuang
    Zuo, Zhixiang
    IMETA, 2024, 3 (05):
  • [8] Identification of Robust Clustering Methods in Gene Expression Data Analysis
    Hossen, Md. Bipul
    Siraj-Ud-Doulah, Md.
    CURRENT BIOINFORMATICS, 2017, 12 (06) : 558 - 562
  • [9] Integrating Data Mining, Deep Learning, and Gene Ontology Analysis for Gene Expression-Based Disease Diagnosis Systems
    Babichev, Sergii
    Liakh, Igor
    Skvor, Jiri
    IEEE ACCESS, 2025, 13 : 21265 - 21278
  • [10] Identification of Gene Signatures Used to Recognize Biological Characteristics of Gastric Cancer Upon Gene Expression Data
    Yan, Zhi
    Luke, Brian T.
    Tsang, Shirley X.
    Xing, Rui
    Pan, Yuanming
    Liu, Yixuan
    Wang, Jinlian
    Geng, Tao
    Li, Jiangeng
    Lu, Youyong
    BIOMARKER INSIGHTS, 2014, 9 : 67 - 76