Association rule mining for genome-wide association studies through Gibbs sampling

被引:0
作者
Qian, Guoqi [1 ]
Sun, Pei-Yun [1 ]
机构
[1] Univ Melbourne, Sch Math & Stat, Parkville, Vic 3010, Australia
关键词
Gibbs sampling; Association rule mining; Genome-wide association study; Genotype-phenotype association; Epistatic interaction; VARIABLE SELECTION; CHROMOSOME; 9P21; RISK;
D O I
10.1007/s41060-023-00456-y
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Finding associations between genetic markers and a phenotypic trait such as coronary artery disease (CAD) is of primary interest in genome-wide association studies (GWAS). A major challenge in GWAS is the involved genomic data often contain large number of genetic markers and the underlying genotype-phenotype relationship is mostly complex. Current statistical and machine learning methods lack the power to tackle this challenge with effectiveness and efficiency. In this paper, we develop a stochastic search method to mine the genotype-phenotype associations from GWAS data. The new method generalizes the well-established association rule mining (ARM) framework for searching for the most important genotype-phenotype association rules, where we develop a multinomial Gibbs sampling algorithm and use it together with the Apriori algorithm to overcome the overwhelming computing complexity in ARM in GWAS. Three simulation studies based on synthetic data are used to assess the performance of our developed method, delivering the anticipated results. Finally, we illustrate the use of the developed method through a case study of CAD GWAS.
引用
收藏
页数:14
相关论文
共 29 条
  • [21] Four SNPs on chromosome 9p21 in a South Korean population implicate a genetic locus that confers high cross-race risk for development of coronary artery disease
    Shen, Gong-Qing
    Li, Lin
    Rao, Shaoqi
    Abdullah, Kalil G.
    Ban, Ji Min
    Lee, Bok-Soo
    Park, Jeong Euy
    Wang, Qing K.
    [J]. ARTERIOSCLEROSIS THROMBOSIS AND VASCULAR BIOLOGY, 2008, 28 (02) : 360 - 365
  • [23] Ultrahigh-dimensional variable selection method for whole-genome gene-gene interaction analysis
    Ueki, Masao
    Tamiya, Gen
    [J]. BMC BIOINFORMATICS, 2012, 13
  • [24] Uffelmann E, 2021, NAT REV METHOD PRIME, V1, DOI 10.1038/s43586-021-00056-9
  • [25] Statistical methods for genome-wide association studies
    Wang, Maggie Haitian
    Cordell, Heather J.
    Van Steen, Kristel
    [J]. SEMINARS IN CANCER BIOLOGY, 2019, 55 : 53 - 60
  • [26] Genome-wide association analysis by lasso penalized logistic regression
    Wu, Tong Tong
    Chen, Yi Fang
    Hastie, Trevor
    Sobel, Eric
    Lange, Kenneth
    [J]. BIOINFORMATICS, 2009, 25 (06) : 714 - 721
  • [27] A unified mixed-model method for association mapping that accounts for multiple levels of relatedness
    Yu, JM
    Pressoir, G
    Briggs, WH
    Bi, IV
    Yamasaki, M
    Doebley, JF
    McMullen, MD
    Gaut, BS
    Nielsen, DM
    Holland, JB
    Kresovich, S
    Buckler, ES
    [J]. NATURE GENETICS, 2006, 38 (02) : 203 - 208
  • [28] Mixed linear model approach adapted for genome-wide association studies
    Zhang, Zhiwu
    Ersoz, Elhan
    Lai, Chao-Qiang
    Todhunter, Rory J.
    Tiwari, Hemant K.
    Gore, Michael A.
    Bradbury, Peter J.
    Yu, Jianming
    Arnett, Donna K.
    Ordovas, Jose M.
    Buckler, Edward S.
    [J]. NATURE GENETICS, 2010, 42 (04) : 355 - U118
  • [29] Regularization and variable selection via the elastic net
    Zou, H
    Hastie, T
    [J]. JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 2005, 67 : 301 - 320