Association rule mining for genome-wide association studies through Gibbs sampling

被引：0

作者：

Qian, Guoqi ^{[1
]}

Sun, Pei-Yun ^{[1
]}

机构：

[1] Univ Melbourne, Sch Math & Stat, Parkville, Vic 3010, Australia

来源：

INTERNATIONAL JOURNAL OF DATA SCIENCE AND ANALYTICS | 2023年

关键词：

Gibbs sampling; Association rule mining; Genome-wide association study; Genotype-phenotype association; Epistatic interaction; VARIABLE SELECTION; CHROMOSOME; 9P21; RISK;

D O I：

10.1007/s41060-023-00456-y

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Finding associations between genetic markers and a phenotypic trait such as coronary artery disease (CAD) is of primary interest in genome-wide association studies (GWAS). A major challenge in GWAS is the involved genomic data often contain large number of genetic markers and the underlying genotype-phenotype relationship is mostly complex. Current statistical and machine learning methods lack the power to tackle this challenge with effectiveness and efficiency. In this paper, we develop a stochastic search method to mine the genotype-phenotype associations from GWAS data. The new method generalizes the well-established association rule mining (ARM) framework for searching for the most important genotype-phenotype association rules, where we develop a multinomial Gibbs sampling algorithm and use it together with the Apriori algorithm to overcome the overwhelming computing complexity in ARM in GWAS. Three simulation studies based on synthetic data are used to assess the performance of our developed method, delivering the anticipated results. Finally, we illustrate the use of the developed method through a case study of CAD GWAS.

引用

页数：14

共 29 条

[21] Four SNPs on chromosome 9p21 in a South Korean population implicate a genetic locus that confers high cross-race risk for development of coronary artery disease
Shen, Gong-Qing
Li, Lin
Rao, Shaoqi
Abdullah, Kalil G.
Ban, Ji Min
Lee, Bok-Soo
Park, Jeong Euy
Wang, Qing K.
[J]. ARTERIOSCLEROSIS THROMBOSIS AND VASCULAR BIOLOGY, 2008, 28 (02) : 360 - 365
[22] Regression shrinkage and selection via the Lasso
Tibshirani, R
[J]. JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 1996, 58 (01) : 267 - 288
[23] Ultrahigh-dimensional variable selection method for whole-genome gene-gene interaction analysis
Ueki, Masao
Tamiya, Gen
[J]. BMC BIOINFORMATICS, 2012, 13
[24] Uffelmann E, 2021, NAT REV METHOD PRIME, V1, DOI 10.1038/s43586-021-00056-9
[25] Statistical methods for genome-wide association studies
Wang, Maggie Haitian
Cordell, Heather J.
Van Steen, Kristel
[J]. SEMINARS IN CANCER BIOLOGY, 2019, 55 : 53 - 60
[26] Genome-wide association analysis by lasso penalized logistic regression
Wu, Tong Tong
Chen, Yi Fang
Hastie, Trevor
Sobel, Eric
Lange, Kenneth
[J]. BIOINFORMATICS, 2009, 25 (06) : 714 - 721
[27] A unified mixed-model method for association mapping that accounts for multiple levels of relatedness
Yu, JM
Pressoir, G
Briggs, WH
Bi, IV
Yamasaki, M
Doebley, JF
McMullen, MD
Gaut, BS
Nielsen, DM
Holland, JB
Kresovich, S
Buckler, ES
[J]. NATURE GENETICS, 2006, 38 (02) : 203 - 208
[28] Mixed linear model approach adapted for genome-wide association studies
Zhang, Zhiwu
Ersoz, Elhan
Lai, Chao-Qiang
Todhunter, Rory J.
Tiwari, Hemant K.
Gore, Michael A.
Bradbury, Peter J.
Yu, Jianming
Arnett, Donna K.
Ordovas, Jose M.
Buckler, Edward S.
[J]. NATURE GENETICS, 2010, 42 (04) : 355 - U118
[29] Regularization and variable selection via the elastic net
Zou, H
Hastie, T
[J]. JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 2005, 67 : 301 - 320

← 1 2 3 →