Confidence intervals for the Mann-Whitney test

被引:37
作者
Perme, Maja Pohar [1 ]
Manevski, Damjan [1 ]
机构
[1] Univ Ljubljana, Fac Med, Inst Biostat & Med Informat, Vrazov Trg 2, Ljubljana 1000, Slovenia
关键词
Mann-Whitney; confidence interval; area under ROC curve; effect size; small sample size; probabilistic index; ROC CURVE; AREA; INFERENCE;
D O I
10.1177/0962280218814556
中图分类号
R19 [保健组织与事业(卫生事业管理)];
学科分类号
摘要
The Mann-Whitney test is a commonly used non-parametric alternative of the two-sample t-test. Despite its frequent use, it is only rarely accompanied with confidence intervals of an effect size. If reported, the effect size is usually measured with the difference of medians or the shift of the two distribution locations. Neither of these two measures directly coincides with the test statistic of the Mann-Whitney test, so the interpretation of the test results and the confidence intervals may be importantly different. In this paper, we focus on the probability that random variable X is lower than random variable Y. This measure is often referred to as the degree of overlap or the probabilistic index; it is in one-to-one relationship with the Mann-Whitney test statistic. The measure equals the area under the ROC curve. Several methods have been proposed for the construction of the confidence interval for this measure, and we review the most promising ones and explain their ideas. We study the properties of different variance estimators and small sample problems of confidence intervals construction. We identify scenarios in which the existing approaches yield inadequate coverage probabilities. We conclude that the DeLong variance estimator is a reliable option regardless of the scenario, but confidence intervals should be constructed using the logit scale to avoid values above 1 or below 0 and the poor coverage probability that follows. A correction is needed for the case when all values from one sample are smaller than the values of the other. We propose a method that improves the coverage probability also in these cases.
引用
收藏
页码:3755 / 3768
页数:14
相关论文
共 22 条
[1]   Small sample inference for probabilistic index models [J].
Amorim, G. ;
Thas, O. ;
Vermeulen, K. ;
Vansteelandt, S. ;
De Neve, J. .
COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2018, 121 :137-148
[2]   AREA ABOVE ORDINAL DOMINANCE GRAPH AND AREA BELOW RECEIVER OPERATING CHARACTERISTIC GRAPH [J].
BAMBER, D .
JOURNAL OF MATHEMATICAL PSYCHOLOGY, 1975, 12 (04) :387-415
[3]   Using the ROC curve for gauging treatment effect in clinical trials [J].
Brumback, LC ;
Pepe, MS ;
Alonzo, TA .
STATISTICS IN MEDICINE, 2006, 25 (04) :575-590
[4]   A Regression Framework for Rank Tests Based on the Probabilistic Index Model [J].
De Neve, Jan ;
Thas, Olivier .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2015, 110 (511) :1276-1283
[5]  
de Winter J. F. C., 2010, Practical Assessment, Research, and Evaluation, V15, DOI [DOI 10.7275/BJ1P-TS64, 10.7275/bj1p-ts64]
[6]   COMPARING THE AREAS UNDER 2 OR MORE CORRELATED RECEIVER OPERATING CHARACTERISTIC CURVES - A NONPARAMETRIC APPROACH [J].
DELONG, ER ;
DELONG, DM ;
CLARKEPEARSON, DI .
BIOMETRICS, 1988, 44 (03) :837-845
[7]   A comparison of confidence/credible interval methods for the area under the ROC curve for continuous diagnostic tests with small sample size [J].
Feng, Dai ;
Cortese, Giuliana ;
Baumgartner, Richard .
STATISTICAL METHODS IN MEDICAL RESEARCH, 2017, 26 (06) :2603-2621
[8]   Comparison of three methods for estimating the standard error of the area under the curve in ROC analysis of quantitative data [J].
Hajian-Tilaki, KO ;
Hanley, JA .
ACADEMIC RADIOLOGY, 2002, 9 (11) :1278-1285
[9]   THE MEANING AND USE OF THE AREA UNDER A RECEIVER OPERATING CHARACTERISTIC (ROC) CURVE [J].
HANLEY, JA ;
MCNEIL, BJ .
RADIOLOGY, 1982, 143 (01) :29-36
[10]   Mann-Whitney test is not just a test of medians: differences in spread can be important [J].
Hart, A .
BRITISH MEDICAL JOURNAL, 2001, 323 (7309) :391-393