A selective inference approach for false discovery rate control using multiomics covariates yields insights into disease risk

被引:10
作者
Yurko, Ronald [1 ]
G'Sell, Max [1 ]
Roeder, Kathryn [1 ,2 ]
Devlin, Bernie [3 ]
机构
[1] Carnegie Mellon Univ, Dept Stat & Data Sci, Pittsburgh, PA 15213 USA
[2] Carnegie Mellon Univ, Dept Computat Biol, Pittsburgh, PA 15213 USA
[3] Univ Pittsburgh, Sch Med, Dept Psychiat, Pittsburgh, PA 15213 USA
关键词
multiple hypothesis testing; false discovery rate; GWAS; eQTL; neuropsychiatric disorders; MULTIPLE; SCHIZOPHRENIA;
D O I
10.1073/pnas.1918862117
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
To correct for a large number of hypothesis tests, most researchers rely on simple multiple testing corrections. Yet, new methodologies of selective inference could potentially improve power while retaining statistical guarantees, especially those that enable exploration of test statistics using auxiliary information (covariates) to weight hypothesis tests for association. We explore one such method, adaptive P-value thresholding (AdaPT), in the framework of genome-wide association studies (GWAS) and gene expression/coexpression studies, with particular emphasis on schizophrenia (SCZ). Selected SCZ GWAS association P values play the role of the primary data for AdaPT; single-nucleotide polymorphisms (SNPs) are selected because they are gene expression quantitative trait loci (eQTLs). This natural pairing of SNPs and genes allow us to map the following covariate values to these pairs: GWAS statistics from genetically correlated bipolar disorder, the effect size of SNP genotypes on gene expression, and genegene coexpression, captured by subnetwork (module) membership. In all, 24 covariates per SNP/gene pair were included in the AdaPT analysis using flexible gradient boosted trees. We demonstrate a substantial increase in power to detect SCZ associations using gene expression information from the developing human prefrontal cortex. We interpret these results in light of recent theories about the polygenic nature of SCZ. Importantly, our entire process for identifying enrichment and creating features with independent complementary data sources can be implemented in many different high-throughput settings to ultimately improve power.
引用
收藏
页码:15028 / 15035
页数:8
相关论文
共 30 条
  • [1] The Genotype-Tissue Expression (GTEx) pilot analysis: Multitissue gene regulation in humans
    Ardlie, Kristin G.
    DeLuca, David S.
    Segre, Ayellet V.
    Sullivan, Timothy J.
    Young, Taylor R.
    Gelfand, Ellen T.
    Trowbridge, Casandra A.
    Maller, Julian B.
    Tukiainen, Taru
    Lek, Monkol
    Ward, Lucas D.
    Kheradpour, Pouya
    Iriarte, Benjamin
    Meng, Yan
    Palmer, Cameron D.
    Esko, Tonu
    Winckler, Wendy
    Hirschhorn, Joel N.
    Kellis, Manolis
    MacArthur, Daniel G.
    Getz, Gad
    Shabalin, Andrey A.
    Li, Gen
    Zhou, Yi-Hui
    Nobel, Andrew B.
    Rusyn, Ivan
    Wright, Fred A.
    Lappalainen, Tuuli
    Ferreira, Pedro G.
    Ongen, Halit
    Rivas, Manuel A.
    Battle, Alexis
    Mostafavi, Sara
    Monlong, Jean
    Sammeth, Michael
    Mele, Marta
    Reverter, Ferran
    Goldmann, Jakob M.
    Koller, Daphne
    Guigo, Roderic
    McCarthy, Mark I.
    Dermitzakis, Emmanouil T.
    Gamazon, Eric R.
    Im, Hae Kyung
    Konkashbaev, Anuar
    Nicolae, Dan L.
    Cox, Nancy J.
    Flutre, Timothee
    Wen, Xiaoquan
    Stephens, Matthew
    [J]. SCIENCE, 2015, 348 (6235) : 648 - 660
  • [2] Gene Ontology: tool for the unification of biology
    Ashburner, M
    Ball, CA
    Blake, JA
    Botstein, D
    Butler, H
    Cherry, JM
    Davis, AP
    Dolinski, K
    Dwight, SS
    Eppig, JT
    Harris, MA
    Hill, DP
    Issel-Tarver, L
    Kasarskis, A
    Lewis, S
    Matese, JC
    Richardson, JE
    Ringwald, M
    Rubin, GM
    Sherlock, G
    [J]. NATURE GENETICS, 2000, 25 (01) : 25 - 29
  • [3] CONTROLLING THE FALSE DISCOVERY RATE - A PRACTICAL AND POWERFUL APPROACH TO MULTIPLE TESTING
    BENJAMINI, Y
    HOCHBERG, Y
    [J]. JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 1995, 57 (01) : 289 - 300
  • [4] A direct approach to estimating false discovery rates conditional on covariates
    Boca, Simina M.
    Leek, Jeffrey T.
    [J]. PEERJ, 2018, 6
  • [5] An Expanded View of Complex Traits: From Polygenic to Omnigenic
    Boyle, Evan A.
    Li, Yang I.
    Pritchard, Jonathan K.
    [J]. CELL, 2017, 169 (07) : 1177 - 1186
  • [6] The Gene Ontology Resource: 20 years and still GOing strong
    Carbon, S.
    Douglass, E.
    Dunn, N.
    Good, B.
    Harris, N. L.
    Lewis, S. E.
    Mungall, C. J.
    Basu, S.
    Chisholm, R. L.
    Dodson, R. J.
    Hartline, E.
    Fey, P.
    Thomas, P. D.
    Albou, L. P.
    Ebert, D.
    Kesling, M. J.
    Mi, H.
    Muruganujian, A.
    Huang, X.
    Poudel, S.
    Mushayahama, T.
    Hu, J. C.
    LaBonte, S. A.
    Siegele, D. A.
    Antonazzo, G.
    Attrill, H.
    Brown, N. H.
    Fexova, S.
    Garapati, P.
    Jones, T. E. M.
    Marygold, S. J.
    Millburn, G. H.
    Rey, A. J.
    Trovisco, V.
    dos Santos, G.
    Emmert, D. B.
    Falls, K.
    Zhou, P.
    Goodman, J. L.
    Strelets, V. B.
    Thurmond, J.
    Courtot, M.
    Osumi-Sutherland, D.
    Parkinson, H.
    Roncaglia, P.
    Acencio, M. L.
    Kuiper, M.
    Laegreid, A.
    Logie, C.
    Lovering, R. C.
    [J]. NUCLEIC ACIDS RESEARCH, 2019, 47 (D1) : D330 - D338
  • [7] Chen T., 2016, KDD16 P 22 ACM, P785, DOI [DOI 10.1145/2939672.2939785, 10.1145/2939672.2939785]
  • [8] From SNPs to pathways: Biological interpretation of type 2 diabetes (T2DM) genome wide association study (GWAS) results
    Cirillo, Elisa
    Kutmon, Martina
    Hernandez, Manuel Gonzalez
    Hooimeijer, Tom
    Adriaens, Michiel E.
    Eijssen, Lars M. T.
    Parnell, Laurence D.
    Coort, Susan L.
    Evelo, Chris T.
    [J]. PLOS ONE, 2018, 13 (04):
  • [9] Empirical Bayes analysis of a microarray experiment
    Efron, B
    Tibshirani, R
    Storey, JD
    Tusher, V
    [J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2001, 96 (456) : 1151 - 1160
  • [10] Greedy function approximation: A gradient boosting machine
    Friedman, JH
    [J]. ANNALS OF STATISTICS, 2001, 29 (05) : 1189 - 1232