Generalized α-investing: definitions, optimality results and application to public databases

被引:42
作者
Aharoni, Ehud [1 ]
Rosset, Saharon [2 ]
机构
[1] IBM Res Lab Haifa, IL-31905 Haifa, Israel
[2] Tel Aviv Univ, IL-69978 Tel Aviv, Israel
关键词
alpha-investing; alpha-spending; False discovery rate; Familywise error rate; Multiple comparisons; FALSE DISCOVERY RATE; ASSOCIATION;
D O I
10.1111/rssb.12048
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
The increasing prevalence and utility of large public databases necessitates the development of appropriate methods for controlling false discovery. Motivated by this challenge, we discuss the generic problem of testing a possibly infinite stream of null hypotheses. In this context, Foster and Stine suggested a novel method named alpha-investing for controlling a false discovery measure known as mFDR. We develop a more general procedure for controlling mFDR, of which alpha-investing is a special case. We show that, in common practical situations, the general procedure can be optimized to produce an expected reward optimal version, which is more powerful than alpha-investing. We then present the concept of quality preserving databases which was originally introduced by Aharoni and co-workers, which formalizes efficient public database management to save costs and to control false discovery simultaneously. We show how one variant of generalized alpha-investing can be used to control mFDR in a quality preserving database and to lead to significant reduction in costs compared with naive approaches for controlling the familywise error rate implemented by Aharoni and co-workers.
引用
收藏
页码:771 / 794
页数:24
相关论文
共 18 条
  • [1] The Quality Preserving Database: A Computational Framework for Encouraging Collaboration, Enhancing Power and Controlling False Discovery
    Aharoni, Ehud
    Neuvirth, Hani
    Rosset, Saharon
    [J]. IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2011, 8 (05) : 1431 - 1437
  • [2] The influenza virus resource at the national center for biotechnology information
    Bao, Yiming
    Bolotov, Pavel
    Dernovoy, Dmitry
    Kiryutin, Boris
    Zaslavsky, Leonid
    Tatusova, Tatiana
    Ostell, Jim
    Lipman, David
    [J]. JOURNAL OF VIROLOGY, 2008, 82 (02) : 596 - 601
  • [3] Benjamini Y, 2001, ANN STAT, V29, P1165
  • [4] CONTROLLING THE FALSE DISCOVERY RATE - A PRACTICAL AND POWERFUL APPROACH TO MULTIPLE TESTING
    BENJAMINI, Y
    HOCHBERG, Y
    [J]. JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 1995, 57 (01) : 289 - 300
  • [5] Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls
    Burton, Paul R.
    Clayton, David G.
    Cardon, Lon R.
    Craddock, Nick
    Deloukas, Panos
    Duncanson, Audrey
    Kwiatkowski, Dominic P.
    McCarthy, Mark I.
    Ouwehand, Willem H.
    Samani, Nilesh J.
    Todd, John A.
    Donnelly, Peter
    Barrett, Jeffrey C.
    Davison, Dan
    Easton, Doug
    Evans, David
    Leung, Hin-Tak
    Marchini, Jonathan L.
    Morris, Andrew P.
    Spencer, Chris C. A.
    Tobin, Martin D.
    Attwood, Antony P.
    Boorman, James P.
    Cant, Barbara
    Everson, Ursula
    Hussey, Judith M.
    Jolley, Jennifer D.
    Knight, Alexandra S.
    Koch, Kerstin
    Meech, Elizabeth
    Nutland, Sarah
    Prowse, Christopher V.
    Stevens, Helen E.
    Taylor, Niall C.
    Walters, Graham R.
    Walker, Neil M.
    Watkins, Nicholas A.
    Winzer, Thilo
    Jones, Richard W.
    McArdle, Wendy L.
    Ring, Susan M.
    Strachan, David P.
    Pembrey, Marcus
    Breen, Gerome
    St Clair, David
    Caesar, Sian
    Gordon-Smith, Katherine
    Jones, Lisa
    Fraser, Christine
    Green, Elain K.
    [J]. NATURE, 2007, 447 (7145) : 661 - 678
  • [6] Genome-wide association study of CNVs in 16,000 cases of eight common diseases and 3,000 shared controls
    Craddock, Nick
    Hurles, Matthew E.
    Cardin, Niall
    Pearson, Richard D.
    Plagnol, Vincent
    Robson, Samuel
    Vukcevic, Damjan
    Barnes, Chris
    Conrad, Donald F.
    Giannoulatou, Eleni
    Holmes, Chris
    Marchini, Jonathan L.
    Stirrups, Kathy
    Tobin, Martin D.
    Wain, Louise V.
    Yau, Chris
    Aerts, Jan
    Ahmad, Tariq
    Andrews, T. Daniel
    Arbury, Hazel
    Attwood, Anthony
    Auton, Adam
    Ball, Stephen G.
    Balmforth, Anthony J.
    Barrett, Jeffrey C.
    Barroso, Ines
    Barton, Anne
    Bennett, Amanda J.
    Bhaskar, Sanjeev
    Blaszczyk, Katarzyna
    Bowes, John
    Brand, Oliver J.
    Braund, Peter S.
    Bredin, Francesca
    Breen, Gerome
    Brown, Morris J.
    Bruce, Ian N.
    Bull, Jaswinder
    Burren, Oliver S.
    Burton, John
    Byrnes, Jake
    Caesar, Sian
    Clee, Chris M.
    Coffey, Alison J.
    Connell, John M. C.
    Cooper, Jason D.
    Dominiczak, Anna F.
    Downes, Kate
    Drummond, Hazel E.
    Dudakia, Darshna
    [J]. NATURE, 2010, 464 (7289) : 713 - U86
  • [7] α-investing:: a procedure for sequential control of expected false discoveries
    Foster, Dean P.
    Stine, Robert A.
    [J]. JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 2008, 70 : 429 - 444
  • [8] CURRENT CONCEPTS Genomewide Association Studies and Human Disease
    Hardy, John
    Singleton, Andrew
    [J]. NEW ENGLAND JOURNAL OF MEDICINE, 2009, 360 (17) : 1759 - 1768
  • [9] Genome-wide association studies for common diseases and complex traits
    Hirschhorn, JN
    Daly, MJ
    [J]. NATURE REVIEWS GENETICS, 2005, 6 (02) : 95 - 108
  • [10] Why most published research findings are false
    Ioannidis, JPA
    [J]. PLOS MEDICINE, 2005, 2 (08) : 696 - 701