CGPS: A machine learning-based approach integrating multiple gene set analysis tools for better prioritization of biologically relevant pathways

被引:75
作者
Ai, Chen [1 ]
Kong, Lei [1 ]
机构
[1] Peking Univ, Ctr Bioinformat, Sch Life Sci, State Key Lab Prot & Plant Gene Res, Beijing 100871, Peoples R China
关键词
Gene expression; Differential expression; Gene set enrichment; Support vector machine; TGF-BETA; EXPRESSION; LEUKEMIA; KEGG; REPRESENTATION; PANOBINOSTAT; INHIBITOR; SURVIVAL; PACKAGE; ALPHA;
D O I
10.1016/j.jgg.2018.08.002
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Gene set enrichment (GSE) analyses play an important role in the interpretation of large-scale transcriptome datasets. Multiple GSE tools can be integrated into a single method as obtaining optimal results is challenging due to the plethora of GSE tools and their discrepant performances. Several existing ensemble methods lead to different scores in sorting pathways as integrated results; furthermore, it is difficult for users to choose a single ensemble score to obtain optimal final results. Here, we develop an ensemble method using a machine learning approach called Combined Gene set analysis incorporating Prioritization and Sensitivity (CGPS) that integrates the results provided by nine prominent GSE tools into a single ensemble score (R score) to sort pathways as integrated results. Moreover, to the best of our knowledge, CGPS is the first GSE ensemble method built based on a priori knowledge of pathways and phenotypes. Compared with 10 widely used individual methods and five types of ensemble scores from two ensemble methods, we demonstrate that sorting pathways based on the R score can better prioritize relevant pathways, as established by an evaluation of 120 simulated datasets and 45 real datasets. Additionally, CGPS is applied to expression data involving the drug panobinostat, which is an anticancer treatment against multiple myeloma. The results identify cell processes associated with cancer, such as the p53 signaling pathway (hsa04115); by contrast, according to two ensemble methods (Enrichment-Browser and EGSEA), this pathway has a rank higher than 20, which may cause users to miss the pathway in their analyses. We show that this method, which is based on a priori knowledge, can capture valuable biological information from numerous types of gene set collections, such as KEGG pathways, GO terms, Reactome, and BioCarta. CGPS is publicly available as a standalone source code at ftp://ftp.cbi.pku.edu.cn/pub/CGPS_download/cgps-1.0.0.tar.gz. Copyright 2018,(C) The Authors. Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, and Genetics Society of China. Published by Elsevier Limited and Science Press.
引用
收藏
页码:489 / 504
页数:16
相关论文
共 15 条
  • [1] CGPS:A machine learning-based approach integrating multiple gene set analysis tools for better prioritization of biologically relevant pathways
    Chen Ai
    Lei Kong
    JournalofGeneticsandGenomics, 2018, 45 (09) : 489 - 504
  • [2] Machine Learning-Based Gene Prioritization Identifies Novel Candidate Risk Genes for Inflammatory Bowel Disease
    Isakov, Ofer
    Dotan, Iris
    Ben-Shachar, Shay
    INFLAMMATORY BOWEL DISEASES, 2017, 23 (09) : 1516 - 1523
  • [3] QuantiMus: A Machine Learning-Based Approach for High Precision Analysis of Skeletal Muscle Morphology
    Kastenschmidt, Jenna M.
    Ellefsen, Kyle L.
    Mannaa, Ali H.
    Giebel, Jesse J.
    Yahia, Rayan
    Ayer, Rachel E.
    Pham, Phillip
    Rios, Rodolfo
    Vetrone, Sylvia A.
    Mozaffar, Tahseen
    Villalta, S. Armando
    FRONTIERS IN PHYSIOLOGY, 2019, 10
  • [4] Solving the twitter sentiment analysis problem based on a machine learning-based approach
    Zarisfi Kermani, Fatemeh
    Sadeghi, Faramarz
    Eslami, Esfandiar
    EVOLUTIONARY INTELLIGENCE, 2020, 13 (03) : 381 - 398
  • [5] Solving the twitter sentiment analysis problem based on a machine learning-based approach
    Fatemeh Zarisfi Kermani
    Faramarz Sadeghi
    Esfandiar Eslami
    Evolutionary Intelligence, 2020, 13 : 381 - 398
  • [6] A transfer learning-based deep convolutional neural network approach for induction machine multiple faults detection
    Kumar, Prashant
    Hati, Ananda Shankar
    Kumar, Prince
    INTERNATIONAL JOURNAL OF ADAPTIVE CONTROL AND SIGNAL PROCESSING, 2023, 37 (09) : 2380 - 2393
  • [7] Machine learning-based analysis identifies a 13-gene prognostic signature to improve the clinical outcomes of colorectal cancer
    Xun, Dexu
    Li, Xue
    Huang, Lan
    Zhao, Yuanchun
    Chen, Jiajia
    Qi, Xin
    JOURNAL OF GASTROINTESTINAL ONCOLOGY, 2024, 15 (05) : 2100 - 2116
  • [8] Use of a Machine Learning-Based High Content Analysis Approach to Identify Photoreceptor Neurite Promoting Molecules
    Fuller, John A.
    Berlinicke, Cynthia A.
    Inglese, James
    Zack, Donald J.
    RETINAL DEGENERATIVE DISEASES: MECHANISMS AND EXPERIMENTAL THERAPY, 2016, 854 : 597 - 603
  • [9] Machine Learning Enabled Prediction of Biologically Relevant Gene Expression Using CT-Based Radiomic Features in Non-Small Cell Lung Cancer
    Sukhadia, Shrey S.
    Sadee, Christopher
    Gevaert, Olivier
    Nagaraj, Shivashankar H.
    CANCER MEDICINE, 2024, 13 (24):
  • [10] A New Machine Learning-Based Framework for Mapping Uncertainty Analysis in RNA-Seq Read Alignment and Gene Expression Estimation
    McDermaid, Adam
    Chen, Xin
    Zhang, Yiran
    Wang, Cankun
    Gu, Shaopeng
    Xie, Juan
    Ma, Qin
    FRONTIERS IN GENETICS, 2018, 9