A simulation study of the strength of evidence in the recommendation of medications based on two trials with statistically significant results

被引:15
|
作者
van Ravenzwaaij, Don [1 ]
Ioannidis, John P. A. [2 ,3 ,4 ,5 ]
机构
[1] Univ Groningen, Dept Psychol, Groningen, Netherlands
[2] Stanford Univ, Dept Med, Stanford, CA 94305 USA
[3] Stanford Univ, Dept Hlth Res & Policy, Stanford, CA 94305 USA
[4] Stanford Univ, Dept Stat, Stanford, CA 94305 USA
[5] Stanford Univ, Meta Res Innovat Ctr Stanford METRICS, Stanford, CA 94305 USA
来源
PLOS ONE | 2017年 / 12卷 / 03期
关键词
P-VALUES; CONFIDENCE-INTERVALS; RANDOMIZED-TRIALS; CLINICAL-TRIALS; HYPOTHESIS; TESTS;
D O I
10.1371/journal.pone.0173184
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
A typical rule that has been used for the endorsement of new medications by the Food and Drug Administration is to have two trials, each convincing on its own, demonstrating effectiveness. "Convincing" may be subjectively interpreted, but the use of p-values and the focus on statistical significance (in particular with p < .05 being coined significant) is pervasive in clinical research. Therefore, in this paper, we calculate with simulations what it means to have exactly two trials, each with p < .05, in terms of the actual strength of evidence quantified by Bayes factors. Our results show that different cases where two trials have a p-value below .05 have wildly differing Bayes factors. Bayes factors of at least 20 in favor of the alternative hypothesis are not necessarily achieved and they fail to be reached in a large proportion of cases, in particular when the true effect size is small (0.2 standard deviations) or zero. In a non-trivial number of cases, evidence actually points to the null hypothesis, in particular when the true effect size is zero, when the number of trials is large, and when the number of participants in both groups is low. We recommend use of Bayes factors as a routine tool to assess endorsement of new medications, because Bayes factors consistently quantify strength of evidence. Use of p-values may lead to paradoxical and spurious decision-making regarding the use of new medications.
引用
收藏
页数:16
相关论文
共 27 条
  • [21] Evidence-Based Research Series-Paper 3: Using an Evidence-Based Research approach to place your results into context after the study is performed to ensure usefulness of the conclusion
    Lund, Hans
    Juhl, Carsten B.
    Norgaard, Birgitte
    Draborg, Eva
    Henriksen, Marius
    Andreasen, Jane
    Christensen, Robin
    Nasser, Mona
    Ciliska, Donna
    Tugwell, Peter
    Clarke, Mike
    Blaine, Caroline
    Martin, Janet
    Ban, Jong-Wook
    Brunnhuber, Klara
    Robinson, Karen A.
    JOURNAL OF CLINICAL EPIDEMIOLOGY, 2021, 129 : 167 - 171
  • [22] Evidence-based appraisal of two guidelines for the diagnosis of suspected, uncomplicated urinary tract infections in primary care: a diagnostic accuracy validation study
    Fanshawe, Thomas R.
    Judge, Rebecca K.
    Mort, Sam
    Butler, Christopher C.
    Hayward, Gail N.
    JOURNAL OF ANTIMICROBIAL CHEMOTHERAPY, 2023, 78 (08) : 2080 - 2088
  • [23] Tolerogenic dendritic cell-based treatment for multiple sclerosis (MS): a harmonised study protocol for two phase I clinical trials comparing intradermal and intranodal cell administration
    Willekens, Barbara
    Presas-Rodriguez, Silvia
    Mansilla, M. J.
    Derdelinckx, Judith
    Lee, Wai-Ping
    Nijs, Griet
    De Laere, Maxime
    Wens, Inez
    Cras, Patrick
    Parizel, Paul
    Van Hecke, Wim
    Ribbens, Annemie
    Billiet, Thibo
    Adams, Geert
    Couttenye, Marie-Madeleine
    Navarro-Barriuso, Juan
    Teniente-Serra, Aina
    Quirant-Sanchez, Bibiana
    Lopez-Diaz de Cerio, Ascension
    Inoges, Susana
    Prosper, Felipe
    Kip, Anke
    Verheij, Herman
    Gross, Catharina C.
    Wiendl, Heinz
    Van Ham, Marieke
    Ten Brinke, Anja
    Barriocanal, Ana Maria
    Massuet-Vilamajo, Anna
    Hens, Niel
    Berneman, Zwi
    Martinez-Caceres, Eva
    Cools, Nathalie
    Ramo-Tello, Cristina
    BMJ OPEN, 2019, 9 (09):
  • [24] Binomial outcomes in dataset with some clusters of size two: can the dependence of twins be accounted for? A simulation study comparing the reliability of statistical methods based on a dataset of preterm infants
    Sauzet, Odile
    Peacock, Janet L.
    BMC MEDICAL RESEARCH METHODOLOGY, 2017, 17
  • [25] Core outcome sets for use in effectiveness trials involving people with bipolar and schizophrenia in a community-based setting (PARTNERS2): study protocol for the development of two core outcome sets
    Keeley, Thomas
    Khan, Humera
    Pinfold, Vanessa
    Williamson, Paula
    Mathers, Jonathan
    Davies, Linda
    Sayers, Ruth
    England, Elizabeth
    Reilly, Siobhan
    Byng, Richard
    Gask, Linda
    Clark, Mike
    Huxley, Peter
    Lewis, Peter
    Birchwood, Maximillian
    Calvert, Melanie
    TRIALS, 2015, 16
  • [26] Weak outcome predictors of multimodal rehabilitation at one-year follow-up in patients with chronic pain-a practice based evidence study from two SQRP centres
    Gerdle, Bjorn
    Molander, Peter
    Stenberg, Gunilla
    Stalnacke, Britt-Marie
    Enthoven, Paul
    BMC MUSCULOSKELETAL DISORDERS, 2016, 17 : 1 - 14
  • [27] Methodological issues regarding power of classical test theory (CTT) and item response theory (IRT)-based approaches for the comparison of patient-reported outcomes in two groups of patients - a simulation study
    Sebille, Veronique
    Hardouin, Jean-Benoit
    Le Neel, Tanguy
    Kubis, Gildas
    Boyer, Francois
    Guillemin, Francis
    Falissard, Bruno
    BMC MEDICAL RESEARCH METHODOLOGY, 2010, 10