Leveraging pleiotropic association using sparse group variable selection in genomics data

被引:1
作者
Sutton, Matthew [1 ]
Sugier, Pierre-Emmanuel [2 ,3 ]
Truong, Therese [3 ]
Liquet, Benoit [2 ,4 ]
机构
[1] Queensland Univ Technol, Ctr Data Sci, Brisbane, Qld, Australia
[2] CNRS, Lab Mathemat & Ieurs Applicat PAU UPP E2S, Pau, France
[3] Univ Paris Saclay, UVSQ, INSERM, Gustave Roussy,CESP,Team Exposome & Heredity, Villejuif, France
[4] Macquarie Univ, Dept Math & Stat, Sydney, NSW, Australia
关键词
Genetic epidemiology; High dimensional data; Lasso penalization; Oncology; Pathway analysis; Pleiotropy; Sparse methods; Variable selection; LASSO; SUBSET; RISK; GENE;
D O I
10.1186/s12874-021-01491-8
中图分类号
R19 [保健组织与事业(卫生事业管理)];
学科分类号
摘要
Background Genome-wide association studies (GWAS) have identified genetic variants associated with multiple complex diseases. We can leverage this phenomenon, known as pleiotropy, to integrate multiple data sources in a joint analysis. Often integrating additional information such as gene pathway knowledge can improve statistical efficiency and biological interpretation. In this article, we propose statistical methods which incorporate both gene pathway and pleiotropy knowledge to increase statistical power and identify important risk variants affecting multiple traits. Methods We propose novel feature selection methods for the group variable selection in multi-task regression problem. We develop penalised likelihood methods exploiting different penalties to induce structured sparsity at a gene (or pathway) and SNP level across all studies. We implement an alternating direction method of multipliers (ADMM) algorithm for our penalised regression methods. The performance of our approaches are compared to a subset based meta analysis approach on simulated data sets. A bootstrap sampling strategy is provided to explore the stability of the penalised methods. Results Our methods are applied to identify potential pleiotropy in an application considering the joint analysis of thyroid and breast cancers. The methods were able to detect eleven potential pleiotropic SNPs and six pathways. A simulation study found that our method was able to detect more true signals than a popular competing method while retaining a similar false discovery rate. Conclusion We developed feature selection methods for jointly analysing multiple logistic regression tasks where prior grouping knowledge is available. Our method performed well on both simulation studies and when applied to a real data analysis of multiple cancers.
引用
收藏
页数:12
相关论文
共 39 条
  • [1] Argyriou A., 2007, ADV NEURAL INFORM PR, P41, DOI DOI 10.7551/MITPRESS/7503.003.0010
  • [2] Bayesian meta-analysis models for cross cancer genomic investigation of pleiotropic effects using group structure
    Baghfalaki, Taban
    Sugier, Pierre-Emmanuel
    Truong, Therese
    Pettitt, Anthony N.
    Mengersen, Kerrie
    Liquet, Benoit
    [J]. STATISTICS IN MEDICINE, 2021, 40 (06) : 1498 - 1518
  • [3] A Subset-Based Approach Improves Power and Interpretation for the Combined Analysis of Genetic Association Studies of Heterogeneous Traits
    Bhattacharjee, Samsiddhi
    Rajaraman, Preetha
    Jacobs, Kevin B.
    Wheeler, William A.
    Melin, Beatrice S.
    Hartge, Patricia
    Yeager, Meredith
    Chung, Charles C.
    Chanock, Stephen J.
    Chatterjee, Nilanjan
    [J]. AMERICAN JOURNAL OF HUMAN GENETICS, 2012, 90 (05) : 821 - 835
  • [4] Distributed optimization and statistical learning via the alternating direction method of multipliers
    Boyd S.
    Parikh N.
    Chu E.
    Peleato B.
    Eckstein J.
    [J]. Foundations and Trends in Machine Learning, 2010, 3 (01): : 1 - 122
  • [5] Panning for gold: "model-X' knockoffs for high dimensional controlled variable selection
    Candes, Emmanuel
    Fan, Yingying
    Janson, Lucas
    Lv, Jinchi
    [J]. JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 2018, 80 (03) : 551 - 577
  • [6] Bootstrapping Lasso Estimators
    Chatterjee, A.
    Lahiri, S. N.
    [J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2011, 106 (494) : 608 - 625
  • [7] The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation
    Chicco, Davide
    Jurman, Giuseppe
    [J]. BMC GENOMICS, 2020, 21 (01)
  • [8] GPA: A Statistical Approach to Prioritizing GWAS Results by Integrating Pleiotropy and Annotation
    Chung, Dongjun
    Yang, Can
    Li, Cong
    Gelernter, Joel
    Zhao, Hongyu
    [J]. PLOS GENETICS, 2014, 10 (11):
  • [9] A Fast and Accurate Algorithm to Test for Binary Phenotypes and Its Application to PheWAS
    Dey, Rounak
    Schmidt, Ellen M.
    Abecasis, Goncalo R.
    Lee, Seunggeun
    [J]. AMERICAN JOURNAL OF HUMAN GENETICS, 2017, 101 (01) : 37 - 49
  • [10] Efron B., 1986, STAT SCI, V1, P54, DOI [DOI 10.1214/SS/1177013815, 10.1214/ss/1177013815]