Do multiple outcome measures require p-value adjustment?

被引：974

作者：

Feise R.J. ^{[1
]}

机构：

[1] Inst. of Evidence-Based Chiropractic, Fort Collins, CO 80528

来源：

BMC Medical Research Methodology | / 2卷 / 1期

关键词：

Chronic Fatigue Syndrome; Author Strategy; Composite Endpoint; Multivariate Test; Multiple Outcome Measure;

D O I：

10.1186/1471-2288-2-8

中图分类号：

学科分类号：

摘要：

Background: Readers may question the interpretation of findings in clinical trials when multiple outcome measures are used without adjustment of the p-value. This question arises because of the increased risk of Type I errors (findings of false "significance") when multiple simultaneous hypotheses are tested at set p-values. The primary aim of this study was to estimate the need to make appropriate p-value adjustments in clinical trials to compensate for a possible increased risk in committing Type I errors when multiple outcome measures are used. Discussion: The classicists believe that the chance of finding at least one test statistically significant due to chance and incorrectly declaring a difference increases as the number of comparisons increases. The rationalists have the following objections to that theory: 1) P-value adjustments are calculated based on how many tests are to be considered, and that number has been defined arbitrarily and variably; 2) P-value adjustments reduce the chance of making type I errors, but they increase the chance of making type II errors or needing to increase the sample size. Summary: Readers should balance a study's statistical significance with the magnitude of effect, the quality of the study and with findings from other studies. Researchers facing multiple outcome measures might want to either select a primary outcome measure or use a global assessment measure, rather than adjusting the p-value.

引用

页码：1 / 4

页数：3

共 40 条

[1] Godfrey K., Statistics in practice. Comparing the means of several groups, N Engl J Med, 313, pp. 1450-1456, (1985)
[2] Feise R.J., Behavioral-graded activity compared with usual care after first-time disk surgery: Considerations of the design of a randomized clinical trial, J Manipulative Physiol Ther, 24, pp. 67-68, (2001)
[3] Ostelo R.W., De Vet H.C., Behavioral-graded activity compared with usual care after first-time disk surgery: Considerations of the design of a randomized clinical trial, J Manipulative Physiol Ther, 24, (2001)
[4] Tukey J.W., Some thoughts on clinical trials, especially problems of multiplicity, Science, 198, pp. 679-684, (1977)
[5] Bland J.M., Altman D.G., Multiple significance tests: The Bonferroni method, BMJ, 310, (1995)
[6] Greenhalgh T., Statistics for the non-statistician. I. Different types of data need different statistical tests, BMJ, 315, pp. 364-366, (1997)
[7] Ludbrook J., Multiple comparison procedures updated, Clin Exp Pharmacol Physiol, 25, pp. 1032-1037, (1998)
[8] Ahlbom A., Biostatistics for Epidemiologists, pp. 52-53, (1993)
[9] Steenland K., Bray I., Greenland S., Boffetta P., Empirical bayes adjustments for multiple results in hypothesis-generating or surveillance studies, Cancer Epidemiol Biomarkers Prev, 9, pp. 895-903, (2000)
[10] Sidak Z., Rectangular confidence regions for the means of multivariate normal distribution, J Am Statist Assoc, 62, pp. 626-633, (1967)

← 1 2 3 4 →