A System-Level Pathway-Phenotype Association Analysis Using Synthetic Feature Random Forest

被引:14
|
作者
Pan, Qinxin [1 ]
Hu, Ting [2 ]
Malley, James D. [3 ]
Andrew, Angeline S. [2 ,4 ]
Karagas, Margaret R. [2 ,4 ]
Moore, Jason H. [1 ,2 ,4 ]
机构
[1] Dartmouth Coll, Geisel Sch Med, Dept Genet, Hanover, NH 03755 USA
[2] Dartmouth Coll, Inst Quantitat Biomed Sci, Hanover, NH 03755 USA
[3] NIH, Div Computat Biosci, Ctr Informat Technol, Bethesda, MD 20892 USA
[4] Dartmouth Coll, Geisel Sch Med, Dept Community & Family Med, Hanover, NH 03755 USA
基金
美国国家卫生研究院;
关键词
statistical epistasis network (SEN); interactions; synthetic feature random forest (SF-RF); epistasis; pathway analysis; GENOME-WIDE ASSOCIATION; SET ENRICHMENT ANALYSIS; BLADDER-CANCER RISK; TELOMERASE ACTIVITY; GENE-EXPRESSION; NETWORKS; DISEASES; CLASSIFICATION; INFORMATION; COMPLEXITY;
D O I
10.1002/gepi.21794
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
As the cost of genome-wide genotyping decreases, the number of genome-wide association studies (GWAS) has increased considerably. However, the transition from GWAS findings to the underlying biology of various phenotypes remains challenging. As a result, due to its system-level interpretability, pathway analysis has become a popular tool for gaining insights on the underlying biology from high-throughput genetic association data. In pathway analyses, gene sets representing particular biological processes are tested for significant associations with a given phenotype. Most existing pathway analysis approaches rely on single-marker statistics and assume that pathways are independent of each other. As biological systems are driven by complex biomolecular interactions, embracing the complex relationships between single-nucleotide polymorphisms (SNPs) and pathways needs to be addressed. To incorporate the complexity of gene-gene interactions and pathway-pathway relationships, we propose a system-level pathway analysis approach, synthetic feature random forest (SF-RF), which is designed to detect pathway-phenotype associations without making assumptions about the relationships among SNPs or pathways. In our approach, the genotypes of SNPs in a particular pathway are aggregated into a synthetic feature representing that pathway via Random Forest (RF). Multiple synthetic features are analyzed using RF simultaneously and the significance of a synthetic feature indicates the significance of the corresponding pathway. We further complement SF-RF with pathway-based Statistical Epistasis Network (SEN) analysis that evaluates interactions among pathways. By investigating the pathway SEN, we hope to gain additional insights into the genetic mechanisms contributing to the pathway-phenotype association. We apply SF-RF to a population-based genetic study of bladder cancer and further investigate the mechanisms that help explain the pathway-phenotype associations using SEN. The bladder cancer associated pathways we found are both consistent with existing biological knowledge and reveal novel and plausible hypotheses for future biological validations.
引用
收藏
页码:209 / 219
页数:11
相关论文
共 8 条
  • [1] A System-Level Transcriptomic Analysis of Schizophrenia Using Postmortem Brain Tissue Samples
    Roussos, Panos
    Katsel, Pavel
    Davis, Kenneth L.
    Siever, Larry J.
    Haroutunian, Vahram
    ARCHIVES OF GENERAL PSYCHIATRY, 2012, 69 (12) : 1205 - 1215
  • [2] Feature Extraction and Analysis for Lung Nodule Classification using Random Forest
    El-Askary, Nada S.
    Salem, Mohammed A-M
    Roushdy, Mohamed, I
    PROCEEDINGS OF 2019 8TH INTERNATIONAL CONFERENCE ON SOFTWARE AND INFORMATION ENGINEERING (ICSIE 2019), 2019, : 248 - 252
  • [3] CO-DESIGN IN SYNTHETIC BIOLOGY: A SYSTEM-LEVEL ANALYSIS OF THE DEVELOPMENT OF AN ENVIRONMENTAL SENSING DEVICE
    Ball, David A.
    Lux, Matthew W.
    Graef, Russell R.
    Peterson, Matthew W.
    Valenti, Jane D.
    Dileo, John
    Peccoud, Jean
    PACIFIC SYMPOSIUM ON BIOCOMPUTING 2010, 2010, : 385 - 396
  • [4] Spontaneous speech feature analysis for alzheimer's disease screening using a random forest classifier
    Hason, Lior
    Krishnan, Sri
    FRONTIERS IN DIGITAL HEALTH, 2022, 4
  • [5] Feature Extraction of Non-proliferative Diabetic Retinopathy Using Faster R-CNN and Automatic Severity Classification System Using Random Forest Method
    Jung, Younghoon
    Kim, Daewon
    JOURNAL OF INFORMATION PROCESSING SYSTEMS, 2022, 18 (05): : 599 - 613
  • [6] Integrated logistic ridge regression and random forest for phenotype-genotype association analysis in categorical genomic data containing non-ignorable missing values
    Wang, Siru
    Qian, Guoqi
    Hopper, John
    APPLIED MATHEMATICAL MODELLING, 2023, 123 : 1 - 22
  • [7] Extraction of gully erosion using multi-level random forest model based on object-based image analysis
    Xu, Mengxia
    Wang, Mingchang
    Wang, Fengyan
    Ji, Xue
    Liu, Ziwei
    Liu, Xingnan
    Zhao, Shijun
    Wang, Minshui
    INTERNATIONAL JOURNAL OF APPLIED EARTH OBSERVATION AND GEOINFORMATION, 2025, 137
  • [8] Performance Analysis of Heart Disease Prediction System using Novel Random Forest Over Naive Bayes Algorithm with an Improved Accuracy Rate
    Poojitha, T.
    Mahaveerakannan, R.
    CARDIOMETRY, 2022, (25): : 1562 - 1569