Reproducibility probability estimation and testing for the Wilcoxon rank-sum test

被引:19
作者
De Capitani, L. [1 ]
De Martini, D. [1 ]
机构
[1] Univ Milano Bicocca, Dip Stat & Metodi Quantitativi, I-20126 Milan, Italy
关键词
RP-testing; risk indexes; asymptotic power approximations; agreement indexes; power estimation; plug-in power estimation; 62G10; 62G09; 62P10; MANN-WHITNEY TEST; SAMPLE-SIZE DETERMINATION; NONPARAMETRIC-TESTS; POWER;
D O I
10.1080/00949655.2013.825721
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
The reproducibility probability (RP) of a statistically significant outcome is the true power of a statistical test and its estimate is a useful indicator of the stability of the test result. RP-testing consists in testing statistical hypotheses using an RP-estimator as test statistic. In the parametric framework, the RP-based test and the classical one are equivalent, while in the nonparametric one to perform RP-testing is possible only approximately. In this work, we evaluate through a wide simulation study the performances of several semi-parametric and nonparametric RP-estimators (RPEs) for the Wilcoxon rank-sum (WRS) test. RPEs have two tasks: to perform RP-testing and to estimate the RP. To compare RPEs performances we adopt risk indexes (e.g. mean square error (MSE)) and an index of agreement between the outcomes of the WRS test and the RP-based test. Results indicate that the rate of disagreement tends to zero as the sample size increases; the overall rate of disagreement provided by semi-parametric RPEs with finite samples (size 20-200 per group) is 0.15%, and that of nonparametric ones is 0.58%. Concerning risk measures, there is not an RPE dominating the others; for high power values, nonparametric RPEs present the lowest MSE; on average, the semi-parametric RPE based on the upper bound of the variance of the test statistic performs best; nevertheless, the relative gains between the best and the worst are quite small (5-10%). To conclude, well-approximated RP-testing for the WRS test can be performed by adopting a semi-parametric RPE. Since nonparametric plug-in based RPEs perform well in presence of high reproducibility, their adoption is suggested for evaluating the stability of test results and, for example, those of clinical trials.
引用
收藏
页码:468 / 493
页数:26
相关论文
共 25 条
[1]   BOUNDS FOR THE VARIANCE OF THE MANN-WHITNEY STATISTIC [J].
BIRNBAUM, ZW ;
KLOSE, OM .
ANNALS OF MATHEMATICAL STATISTICS, 1957, 28 (04) :933-945
[2]  
Brunner E, 2000, BIOMETRICAL J, V42, P17, DOI 10.1002/(SICI)1521-4036(200001)42:1<17::AID-BIMJ17>3.0.CO
[3]  
2-U
[4]   On stochastic orderings of the Wilcoxon Rank Sum test statistic-With applications to reproducibility probability estimation testing [J].
De Capitani, L. ;
De Martini, D. .
STATISTICS & PROBABILITY LETTERS, 2011, 81 (08) :937-946
[5]   An Introduction to RP-Testing [J].
De Capitanim, Lucio .
EPIDEMIOLOGY BIOSTATISTICS AND PUBLIC HEALTH, 2013, 10 (01)
[6]   Copula-based models for the power of independence tests [J].
De Martini, D ;
Vespa, E .
COMMUNICATIONS IN STATISTICS-THEORY AND METHODS, 2005, 34 (12) :2283-2297
[7]  
De Martini D., 2013, SUCC PROB EST APPL
[8]   Reproducibility probability estimation for testing statistical hypotheses [J].
De Martini, Daniele .
STATISTICS & PROBABILITY LETTERS, 2008, 78 (09) :1056-1061
[9]   Stability criteria for the outcomes of statistical tests to assess drug effectiveness with a single study [J].
De Martini, Daniele .
PHARMACEUTICAL STATISTICS, 2012, 11 (04) :273-279
[10]  
Di Bucchianico A, 1999, J STAT PLAN INFER, V79, P349, DOI 10.1016/S0378-3758(98)00261-4