Control procedures and estimators of the false discovery rate and their application in low-dimensional settings: an empirical investigation

被引:24
作者
Brinster, Regina [1 ,2 ,3 ]
Koettgen, Anna [4 ,5 ]
Tayo, Bamidele O. [6 ]
Schumacher, Martin [2 ,3 ]
Sekula, Peggy [4 ,5 ]
机构
[1] Heidelberg Univ, Inst Med Biometry & Informat, Neuenheimer Feld 130-3, D-69120 Heidelberg, Germany
[2] Univ Freiburg, Inst Med Biometry & Stat, Fac Med, Stefan Meier Str 26, D-79104 Freiburg, Germany
[3] Univ Freiburg, Med Ctr, Stefan Meier Str 26, D-79104 Freiburg, Germany
[4] Univ Freiburg, Fac Med, Inst Genet Epidemiol, Hugstetter Str 49, D-79106 Freiburg, Germany
[5] Univ Freiburg, Med Ctr, Hugstetter Str 49, D-79106 Freiburg, Germany
[6] Loyola Univ Chicago, Dept Publ Hlth Sci, Stritch Sch Med, Maywood, IL USA
来源
BMC BIOINFORMATICS | 2018年 / 19卷
关键词
False discovery rate; Simulation study; Low-dimensional setting; Q-value method; REJECTIVE MULTIPLE TEST; ASSOCIATION; TESTS;
D O I
10.1186/s12859-018-2081-x
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: When many (up to millions) of statistical tests are conducted in discovery set analyses such as genome-wide association studies (GWAS), approaches controlling family-wise error rate (FWER) or false discovery rate (FDR) are required to reduce the number of false positive decisions. Some methods were specifically developed in the context of high-dimensional settings and partially rely on the estimation of the proportion of true null hypotheses. However, these approaches are also applied in low-dimensional settings such as replication set analyses that might be restricted to a small number of specific hypotheses. The aim of this study was to compare different approaches in low-dimensional settings using (a) real data from the CKDGen Consortium and (b) a simulation study. Results: In both application and simulation FWER approaches were less powerful compared to FDR control methods, whether a larger number of hypotheses were tested or not. Most powerful was the q-value method. However, the specificity of this method to maintain true null hypotheses was especially decreased when the number of tested hypotheses was small. In this low-dimensional situation, estimation of the proportion of true null hypotheses was biased. Conclusions: The results highlight the importance of a sizeable data set for a reliable estimation of the proportion of true null hypotheses. Consequently, methods relying on this estimation should only be applied in high-dimensional settings. Furthermore, if the focus lies on testing of a small number of hypotheses such as in replication settings, FWER methods rather than FDR methods should be preferred to maintain high specificity.
引用
收藏
页数:10
相关论文
共 28 条
  • [1] A global reference for human genetic variation
    Altshuler, David M.
    Durbin, Richard M.
    Abecasis, Goncalo R.
    Bentley, David R.
    Chakravarti, Aravinda
    Clark, Andrew G.
    Donnelly, Peter
    Eichler, Evan E.
    Flicek, Paul
    Gabriel, Stacey B.
    Gibbs, Richard A.
    Green, Eric D.
    Hurles, Matthew E.
    Knoppers, Bartha M.
    Korbel, Jan O.
    Lander, Eric S.
    Lee, Charles
    Lehrach, Hans
    Mardis, Elaine R.
    Marth, Gabor T.
    McVean, Gil A.
    Nickerson, Deborah A.
    Wang, Jun
    Wilson, Richard K.
    Boerwinkle, Eric
    Doddapaneni, Harsha
    Han, Yi
    Korchina, Viktoriya
    Kovar, Christie
    Lee, Sandra
    Muzny, Donna
    Reid, Jeffrey G.
    Zhu, Yiming
    Chang, Yuqi
    Feng, Qiang
    Fang, Xiaodong
    Guo, Xiaosen
    Jian, Min
    Jiang, Hui
    Jin, Xin
    Lan, Tianming
    Li, Guoqing
    Li, Jingxiang
    Li, Yingrui
    Liu, Shengmao
    Liu, Xiao
    Lu, Yao
    Ma, Xuedi
    Tang, Meifang
    Wang, Bo
    [J]. NATURE, 2015, 526 (7571) : 68 - +
  • [2] [Anonymous], MODERN EPIDEMIOLOGY
  • [3] A tutorial on statistical methods for population association studies
    Balding, David J.
    [J]. NATURE REVIEWS GENETICS, 2006, 7 (10) : 781 - 791
  • [4] Benjamini Y, 2001, ANN STAT, V29, P1165
  • [5] CONTROLLING THE FALSE DISCOVERY RATE - A PRACTICAL AND POWERFUL APPROACH TO MULTIPLE TESTING
    BENJAMINI, Y
    HOCHBERG, Y
    [J]. JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 1995, 57 (01) : 289 - 300
  • [6] Adaptive linear step-up procedures that control the false discovery rate
    Benjamini, Yoav
    Krieger, Abba M.
    Yekutieli, Daniel
    [J]. BIOMETRIKA, 2006, 93 (03) : 491 - 507
  • [7] Blanchard G., 2015, MUTOSS UNIFIED MULTI
  • [8] MULTIPLE SIGNIFICANCE TESTS - THE BONFERRONI METHOD .10.
    BLAND, JM
    ALTMAN, DG
    [J]. BRITISH MEDICAL JOURNAL, 1995, 310 (6973) : 170 - 170
  • [9] CKDGen Consortium, MET DAT
  • [10] Empirical Bayes analysis of a microarray experiment
    Efron, B
    Tibshirani, R
    Storey, JD
    Tusher, V
    [J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2001, 96 (456) : 1151 - 1160