A Bayesian approach to identify genes and gene-level SNP aggregates in a genetic analysis of cancer data

被引:2
|
作者
Stingo, Francesco C. [1 ]
Swartz, Michael D. [2 ]
Vannucci, Marina [3 ]
机构
[1] MD Anderson Canc Ctr, Dept Biostat, Houston, TX 77030 USA
[2] UT Sch Publ Hlth, Dept Biostat, Houston, TX 77030 USA
[3] Rice Univ, Dept Stat, Houston, TX 77251 USA
基金
美国国家科学基金会;
关键词
Bayesian variable selection; Hardy-Weinberg equilibrium law; Linear models; Linkage disequilibrium; Markov random field; SNP data; GENOME-WIDE ASSOCIATION; LUNG-CANCER; VARIABLE SELECTION; MISSING HERITABILITY; CANDIDATE GENE; SUSCEPTIBILITY LOCUS; STOCHASTIC SEARCH; RARE VARIANTS; LINEAR-MODELS; RISK;
D O I
10.4310/SII.2015.v8.n2.a2
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Complex diseases, such as cancer, arise from complex etiologies consisting of multiple single-nucleotide polymorphisms (SNPs), each contributing a small amount to the overall risk of disease. Thus, many researchers have gone beyond single-SNPs analysis methods, focusing instead on groups of SNPs, for example by analysing haplotypes. More recently, pathway-based methods have been proposed that use prior biological knowledge on gene function to achieve a more powerful analysis of genome-wide association studies (GWAS) data. In this paper we propose a novel Bayesian modeling framework to identify molecular biomarkers for disease prediction. Our method combines pathway-based approaches with multiple SNP analyses of a specified region of interest. The model's development is motivated by SNP data from a lung cancer study. In our approach we define gene-level scores based on SNP allele frequencies and use a linear modeling setting to study the scores association to the observed phenotype. The basic idea behind the definition of gene-level scores is to weigh the SNPs within the gene according to their rarity, based on genotype frequencies expected under the Hardy-Weinberg equilibrium law. This results in scores giving more importance to the unusually low frequencies, i.e. to SNPs that might indicate peculiar genetic differences between subjects belonging to different groups. An additional feature of our approach is that we incorporate information on SNP-to-SNP associations into the model. In particular, we use network priors that model the linkage disequilibrium between SNPs. For posterior inference, we design a stochastic search method that identifies significant biomarkers (genes and SNPs) for disease prediction. We assess performances on simulated data and compare results to existing approaches. We then show the ability of the proposed methodology to detect relevant genes and associated SNPs in a lung cancer dataset.
引用
收藏
页码:137 / 151
页数:15
相关论文
共 16 条
  • [1] Systematic review of gastric cancer-associated genetic variants, gene-based meta-analysis, and gene-level functional analysis to identify candidate genes for drug development
    Lee, Sangjun
    Yang, Han-Kwang
    Lee, Hyuk-Joon
    Park, Do Joong
    Kong, Seong-Ho
    Park, Sue K.
    FRONTIERS IN GENETICS, 2022, 13
  • [2] Gene-level analysis reveals the genetic aetiology and therapeutic targets of schizophrenia
    Dang, Xinglun
    Teng, Zhaowei
    Yang, Yongfeng
    Li, Wenqiang
    Liu, Jiewei
    Hui, Li
    Zhou, Dongsheng
    Gong, Daohua
    Dai, Shan-Shan
    Li, Yifan
    Li, Xingxing
    Lv, Luxian
    Zeng, Yong
    Yuan, Yonggui
    Ma, Xiancang
    Liu, Zhongchun
    Li, Tao
    Luo, Xiong-Jian
    NATURE HUMAN BEHAVIOUR, 2025, 9 (03): : 609 - 624
  • [3] A meta-analysis approach with filtering for identifying gene-level gene-environment interactions
    Wang, Jiebiao
    Liu, Qianying
    Pierce, Brandon L.
    Huo, Dezheng
    Olopade, Olufunmilayo I.
    Ahsan, Habibul
    Chen, Lin S.
    GENETIC EPIDEMIOLOGY, 2018, 42 (05) : 434 - 446
  • [4] BAYESIAN JOINT MODELING OF MULTIPLE GENE NETWORKS AND DIVERSE GENOMIC DATA TO IDENTIFY TARGET GENES OF A TRANSCRIPTION FACTOR
    Wei, Peng
    Pan, Wei
    ANNALS OF APPLIED STATISTICS, 2012, 6 (01) : 334 - 355
  • [5] JOINT ANALYSIS OF SNP AND GENE EXPRESSION DATA IN GENETIC ASSOCIATION STUDIES OF COMPLEX DISEASES
    Huang, Yen-Tsung
    VanderWeele, Tyler J.
    Lin, Xihong
    ANNALS OF APPLIED STATISTICS, 2014, 8 (01) : 352 - 376
  • [6] Genome-wide association analysis and gene set enrichment analysis with SNP data identify genes associated with 305-day milk yield in Holstein dairy cows
    Clancey, E.
    Kiser, J. N.
    Moraes, J. G. N.
    Dalton, J. C.
    Spencer, T. E.
    Neibergs, H. L.
    ANIMAL GENETICS, 2019, 50 (03) : 254 - 258
  • [7] An integrated multivariate group sparse approach to identify differentially expressed genes of breast cancer data
    Napagoda, N. A. D. N.
    INTERNATIONAL JOURNAL OF DATA MINING AND BIOINFORMATICS, 2019, 22 (02) : 149 - 170
  • [8] A three-stage approach to identify biomarker signatures for cancer genetic data with survival endpoints
    Wu, Xue
    Chen, Chixiang
    Li, Zheng
    Zhang, Lijun
    Chinchilli, Vernon M.
    Wang, Ming
    STATISTICAL METHODS AND APPLICATIONS, 2024, 33 (03) : 863 - 883
  • [9] Gastric Cancer Associated Genes Identified by an Integrative Analysis of Gene Expression Data
    Jiang, Bing
    Li, Shuwen
    Jiang, Zhi
    Shao, Ping
    BIOMED RESEARCH INTERNATIONAL, 2017, 2017
  • [10] Genetic network and gene set enrichment analysis to identify biomarkers related to cigarette smoking and lung cancer
    Fang, Xiaocong
    Netzer, Michael
    Baumgartner, Christian
    Bai, Chunxue
    Wang, Xiangdong
    CANCER TREATMENT REVIEWS, 2013, 39 (01) : 77 - 88