Reliability and efficiency of algorithms for computing the significance of the Mann-Whitney test

被引:9
作者
Nagarajan, Niranjan [1 ]
Keich, Uri [2 ]
机构
[1] Univ Maryland, CBCB, UMIACS, College Pk, MD 20742 USA
[2] Cornell Univ, Dept Comp Sci, Ithaca, NY 14850 USA
关键词
Numerical error; Exact computation; FFT; SIGNIFICANCE PROBABILITIES; WILCOXON; APPROXIMATION;
D O I
10.1007/s00180-009-0148-x
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Motivated by recent applications of the Mann-Whitney U test to large data sets we took a critical look at current methods for computing its significance. Surprisingly, we found that the two fastest and most popular tools for exact computation of the test significance, Dinneen and Blakesley's and Harding's, can exhibit large numerical errors even in moderately large datasets. In addition, another method proposed by Pagano and Tritchler also suffers from a similar numerical instability and can produce inaccurate results. This motivated our development of a new algorithm, mw-sFFT, for the exact computation of the Mann-Whitney test with no ties. Among the class of exact algorithms that are numerically stable, mw-sFFT has the best complexity: O(m (2) n) versus O(m (2) n (2)) for others, where m and n are the two sample sizes. This asymptotic efficiency is also reflected in the practical runtime of the algorithm. In addition, we also present a rigorous analysis of the propagation of numerical errors in mw-sFFT to derive an error guarantee for the values computed by the algorithm. The reliability and efficiency of mw-sFFT make it a valuable tool in compuational applications and we plan to provide open-source libraries for it in C++ and Matlab.
引用
收藏
页码:605 / 622
页数:18
相关论文
共 25 条
[1]   Degrees of differential gene expression: detecting biologically significant expression differences and estimating their magnitudes [J].
Bickel, DR .
BIOINFORMATICS, 2004, 20 (05) :682-U255
[2]   AN APPROXIMATION TO WILCOXON-MANN-WHITNEY DISTRIBUTION [J].
BUCKLE, N ;
KRAFT, C ;
VANEEDEN, C .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1969, 64 (326) :591-&
[3]  
Dembo A., 1998, Large Deviation Techniques and Applications
[4]  
Di Bucchianico A, 1999, J STAT PLAN INFER, V79, P349, DOI 10.1016/S0378-3758(98)00261-4
[5]  
DINNEEN LC, 1973, ROY STAT SOC C-APP, V22, P269
[6]   SIGNIFICANCE PROBABILITIES OF THE WILCOXON TEST [J].
FIX, E ;
HODGES, JL .
ANNALS OF MATHEMATICAL STATISTICS, 1955, 26 (02) :301-312
[7]  
Froda S, 2000, CAN J STAT, V28, P137
[8]  
HARDING EF, 1984, J R STAT SOC C-APPL, V33, P1
[9]  
HODGES JL, 1990, J EDUC STAT, V15, P249, DOI 10.2307/1165034
[10]  
JIN R, 2003, SADDLEPOINT APPROXIM, P149