Interpreting observational studies: why empirical calibration is needed to correct p-values

被引:156
作者
Schuemie, Martijn J. [1 ,2 ]
Ryan, Patrick B. [2 ,3 ]
DuMouchel, William [2 ,4 ]
Suchard, Marc A. [2 ,5 ]
Madigan, David [2 ,6 ]
机构
[1] Erasmus MC, Dept Med Informat, Rotterdam, Netherlands
[2] Fdn Natl Inst Hlth, Bethesda, MD USA
[3] Janssen Res & Dev LLC, Titusville, NJ USA
[4] Oracle Hlth Sci, Burlington, MA USA
[5] Univ Calif Los Angeles, Sch Publ Hlth, Dept Biostat, Los Angeles, CA 90024 USA
[6] Columbia Univ, Dept Stat, New York, NY USA
基金
美国国家卫生研究院;
关键词
hypothesis testing; calibration; negative controls; observational studies; ORAL BISPHOSPHONATES; LIVER-INJURY; RISK; COHORT; PRESCRIPTION; VALIDATION; DIAGNOSIS; CANCER; CODES;
D O I
10.1002/sim.5925
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Often the literature makes assertions of medical product effects on the basis of p<0.05'. The underlying premise is that at this threshold, there is only a 5% probability that the observed effect would be seen by chance when in reality there is no effect. In observational studies, much more than in randomized trials, bias and confounding may undermine this premise. To test this premise, we selected three exemplar drug safety studies from literature, representing a case-control, a cohort, and a self-controlled case series design. We attempted to replicate these studies as best we could for the drugs studied in the original articles. Next, we applied the same three designs to sets of negative controls: drugs that are not believed to cause the outcome of interest. We observed how often p<0.05 when the null hypothesis is true, and we fitted distributions to the effect estimates. Using these distributions, we compute calibrated p-values that reflect the probability of observing the effect estimate under the null hypothesis, taking both random and systematic error into account. An automated analysis of scientific literature was performed to evaluate the potential impact of such a calibration. Our experiment provides evidence that the majority of observational studies would declare statistical significance when no effect is present. Empirical calibration was found to reduce spurious results to the desired 5% level. Applying these adjustments to literature suggests that at least 54% of findings with p<0.05 are not actually statistically significant and should be reevaluated. (c) 2013 The Authors. Statistics in Medicine published by John Wiley & Sons Ltd.
引用
收藏
页码:209 / 218
页数:10
相关论文
共 22 条
[1]   Validation of administrative data used for the diagnosis of upper gastrointestinal events following nonsteroidal anti-inflammatory drug prescription [J].
Abraham, NS ;
Cohen, DC ;
Rivers, B ;
Richardson, P .
ALIMENTARY PHARMACOLOGY & THERAPEUTICS, 2006, 24 (02) :299-306
[2]   Validation of diagnoses of peptic ulcers and bleeding from administrative databases: A multi-health maintenance organization study [J].
Andrade, SE ;
Gurwitz, JH ;
Chan, KA ;
Donahue, JG ;
Beck, A ;
Boles, M ;
Buist, DSM ;
Goodman, M ;
LaCroix, AZ ;
Levin, TR ;
Platt, R .
JOURNAL OF CLINICAL EPIDEMIOLOGY, 2002, 55 (03) :310-313
[3]   Exposure to Oral Bisphosphonates and Risk of Esophageal Cancer [J].
Cardwell, Chris R. ;
Abnet, Christian C. ;
Cantwell, Marie M. ;
Murray, Liam J. .
JAMA-JOURNAL OF THE AMERICAN MEDICAL ASSOCIATION, 2010, 304 (06) :657-663
[4]   A cohort study of the incidence of serious acute liver injury in diabetic patients treated with hypoglycemic agents [J].
Chan, KA ;
Truman, A ;
Gurwitz, JH ;
Hurley, JS ;
Martinson, B ;
Platt, R ;
Everhart, JE ;
Moseley, RH ;
Terrault, N ;
Ackerson, L ;
Selby, JV .
ARCHIVES OF INTERNAL MEDICINE, 2003, 163 (06) :728-734
[5]   The accuracy of diagnosis and procedural codes for patients with upper GI hemorrhage [J].
Cooper, GS ;
Chak, A ;
Lloyd, LE ;
Yurchick, PJ ;
Harper, DL ;
Rosenthal, GE .
GASTROINTESTINAL ENDOSCOPY, 2000, 51 (04) :423-426
[6]   Large-scale simultaneous hypothesis testing: The choice of a null hypothesis [J].
Efron, B .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2004, 99 (465) :96-104
[7]   Diabetes increases the risk of acute hepatic failure [J].
El-Serag, HB ;
Everhart, JE .
GASTROENTEROLOGY, 2002, 122 (07) :1822-1828
[8]  
Farrington CP, 1996, AM J EPIDEMIOL, V143, P1165
[9]   Oral bisphosphonates and risk of cancer of oesophagus, stomach, and colorectum: case-control analysis within a UK primary care cohort [J].
Green, Jane ;
Czanner, Gabriela ;
Reeves, Gillian ;
Watson, Joanna ;
Wise, Lesley ;
Beral, Valerie .
BMJ-BRITISH MEDICAL JOURNAL, 2010, 341 :545
[10]   Probabilistic Approaches to Better Quantifying the Results of Epidemiologic Studies [J].
Gustafson, Paul ;
McCandless, Lawrence C. .
INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH, 2010, 7 (04) :1520-1539