Association rule mining for genome-wide association studies through Gibbs sampling

被引:0
作者
Qian, Guoqi [1 ]
Sun, Pei-Yun [1 ]
机构
[1] Univ Melbourne, Sch Math & Stat, Parkville, Vic 3010, Australia
关键词
Gibbs sampling; Association rule mining; Genome-wide association study; Genotype-phenotype association; Epistatic interaction; VARIABLE SELECTION; CHROMOSOME; 9P21; RISK;
D O I
10.1007/s41060-023-00456-y
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Finding associations between genetic markers and a phenotypic trait such as coronary artery disease (CAD) is of primary interest in genome-wide association studies (GWAS). A major challenge in GWAS is the involved genomic data often contain large number of genetic markers and the underlying genotype-phenotype relationship is mostly complex. Current statistical and machine learning methods lack the power to tackle this challenge with effectiveness and efficiency. In this paper, we develop a stochastic search method to mine the genotype-phenotype associations from GWAS data. The new method generalizes the well-established association rule mining (ARM) framework for searching for the most important genotype-phenotype association rules, where we develop a multinomial Gibbs sampling algorithm and use it together with the Apriori algorithm to overcome the overwhelming computing complexity in ARM in GWAS. Three simulation studies based on synthetic data are used to assess the performance of our developed method, delivering the anticipated results. Finally, we illustrate the use of the developed method through a case study of CAD GWAS.
引用
收藏
页数:14
相关论文
共 29 条
  • [1] Parallel and distributed association rule mining in life science: A novel parallel algorithm to mine genomics data
    Agapito, Giuseppe
    Guzzi, Pietro Hiram
    Cannataro, Mario
    [J]. INFORMATION SCIENCES, 2021, 575 : 747 - 761
  • [2] An efficient and scalable SPARK preprocessing methodology for Genome Wide Association Studies
    Agapito, Giuseppe
    Guzzi, Pietro Hiram
    Cannataro, Mario
    [J]. 2020 28TH EUROMICRO INTERNATIONAL CONFERENCE ON PARALLEL, DISTRIBUTED AND NETWORK-BASED PROCESSING (PDP 2020), 2020, : 369 - 375
  • [3] Agrawal R., 1993, SIGMOD Record, V22, P207, DOI 10.1145/170036.170072
  • [4] Random forests
    Breiman, L
    [J]. MACHINE LEARNING, 2001, 45 (01) : 5 - 32
  • [5] A common variant on chromosome 9p21 affects the risk of early-onset coronary artery disease
    Chen, Zhong
    Qian, Qi
    Ma, Genshan
    Wang, Jiahong
    Zhang, Xiaoli
    Feng, Yi
    Shen, Chengxing
    Yao, Yuyu
    [J]. MOLECULAR BIOLOGY REPORTS, 2009, 36 (05) : 889 - 893
  • [6] Cho Seoae, 2009, BMC Proc, V3 Suppl 7, pS25
  • [7] Sure independence screening for ultrahigh dimensional feature space
    Fan, Jianqing
    Lv, Jinchi
    [J]. JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 2008, 70 : 849 - 883
  • [8] FlorianHebert MathieuEmily D.C., 2019, Simulation of genotypic profiles and binary phenotypes for GWASs
  • [9] Hahsler M, 2005, J STAT SOFTW, V14
  • [10] A variable selection method for genome-wide association studies
    He, Qianchuan
    Lin, Dan-Yu
    [J]. BIOINFORMATICS, 2011, 27 (01) : 1 - 8