Do multiple outcome measures require p-value adjustment?

被引:974
作者
Feise R.J. [1 ]
机构
[1] Inst. of Evidence-Based Chiropractic, Fort Collins, CO 80528
关键词
Chronic Fatigue Syndrome; Author Strategy; Composite Endpoint; Multivariate Test; Multiple Outcome Measure;
D O I
10.1186/1471-2288-2-8
中图分类号
学科分类号
摘要
Background: Readers may question the interpretation of findings in clinical trials when multiple outcome measures are used without adjustment of the p-value. This question arises because of the increased risk of Type I errors (findings of false "significance") when multiple simultaneous hypotheses are tested at set p-values. The primary aim of this study was to estimate the need to make appropriate p-value adjustments in clinical trials to compensate for a possible increased risk in committing Type I errors when multiple outcome measures are used. Discussion: The classicists believe that the chance of finding at least one test statistically significant due to chance and incorrectly declaring a difference increases as the number of comparisons increases. The rationalists have the following objections to that theory: 1) P-value adjustments are calculated based on how many tests are to be considered, and that number has been defined arbitrarily and variably; 2) P-value adjustments reduce the chance of making type I errors, but they increase the chance of making type II errors or needing to increase the sample size. Summary: Readers should balance a study's statistical significance with the magnitude of effect, the quality of the study and with findings from other studies. Researchers facing multiple outcome measures might want to either select a primary outcome measure or use a global assessment measure, rather than adjusting the p-value.
引用
收藏
页码:1 / 4
页数:3
相关论文
共 40 条
  • [1] Godfrey K., Statistics in practice. Comparing the means of several groups, N Engl J Med, 313, pp. 1450-1456, (1985)
  • [2] Feise R.J., Behavioral-graded activity compared with usual care after first-time disk surgery: Considerations of the design of a randomized clinical trial, J Manipulative Physiol Ther, 24, pp. 67-68, (2001)
  • [3] Ostelo R.W., De Vet H.C., Behavioral-graded activity compared with usual care after first-time disk surgery: Considerations of the design of a randomized clinical trial, J Manipulative Physiol Ther, 24, (2001)
  • [4] Tukey J.W., Some thoughts on clinical trials, especially problems of multiplicity, Science, 198, pp. 679-684, (1977)
  • [5] Bland J.M., Altman D.G., Multiple significance tests: The Bonferroni method, BMJ, 310, (1995)
  • [6] Greenhalgh T., Statistics for the non-statistician. I. Different types of data need different statistical tests, BMJ, 315, pp. 364-366, (1997)
  • [7] Ludbrook J., Multiple comparison procedures updated, Clin Exp Pharmacol Physiol, 25, pp. 1032-1037, (1998)
  • [8] Ahlbom A., Biostatistics for Epidemiologists, pp. 52-53, (1993)
  • [9] Steenland K., Bray I., Greenland S., Boffetta P., Empirical bayes adjustments for multiple results in hypothesis-generating or surveillance studies, Cancer Epidemiol Biomarkers Prev, 9, pp. 895-903, (2000)
  • [10] Sidak Z., Rectangular confidence regions for the means of multivariate normal distribution, J Am Statist Assoc, 62, pp. 626-633, (1967)