Fast surrogates of U-statistics

被引:11
作者
Lin, N. [1 ]
Xi, R. [1 ]
机构
[1] Washington Univ, Dept Math, St Louis, MO 63130 USA
基金
美国国家科学基金会;
关键词
Compendex;
D O I
10.1016/j.csda.2009.08.009
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
U-statistics have long been known as a class of nonparametric estimators with good theoretical properties such as unbiasedness and asymptotic normality. However, their applications in modern statistical analysis are limited due to the high computational complexity, especially when massive data sets are becoming more and more common nowadays. In this paper, using the "divide-and-conquer" technique, we developed two surrogates of the U-statistics, aggregated U-statistics and average aggregated U-statistics, both of which are shown asymptotically equivalent to U-statistics and computationally much more efficient. When dividing the raw data set into K subsets, the two proposed estimators reduce the computational complexity from O(N-m) to O(K(N/K)(m)), which results in significant time reduction as long as K = o(N) and m >= 2. The merit of the two proposed statistics is demonstrated by both simulation studies and real data examples. (C) 2009 Elsevier B.V. All rights reserved.
引用
收藏
页码:16 / 24
页数:9
相关论文
共 21 条
  • [1] [Anonymous], 1983, Stat Prob Letters, DOI [10.1016/0167-7152(83)90054-8, DOI 10.1016/0167-7152(83)90054-8]
  • [2] Kendall's tau for serial dependence
    Ferguson, TS
    Genest, C
    Hallin, M
    [J]. CANADIAN JOURNAL OF STATISTICS-REVUE CANADIENNE DE STATISTIQUE, 2000, 28 (03): : 587 - 604
  • [3] Asymptotic approximations to the distribution of Kendall's sample
    Gatto, Riccardo
    [J]. JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION, 2009, 79 (05) : 671 - 679
  • [4] Clustering data streams
    Guha, S
    Mishra, N
    Motwani, R
    O'Callaghan, L
    [J]. 41ST ANNUAL SYMPOSIUM ON FOUNDATIONS OF COMPUTER SCIENCE, PROCEEDINGS, 2000, : 359 - 366
  • [5] A weighted multivariate signed-rank test for cluster-correlated data
    Haataja, Riina
    Larocque, Denis
    Nevalainen, Jaakko
    Oja, Hannu
    [J]. JOURNAL OF MULTIVARIATE ANALYSIS, 2009, 100 (06) : 1107 - 1119
  • [6] A NON-PARAMETRIC TEST OF INDEPENDENCE
    HOEFFDING, W
    [J]. ANNALS OF MATHEMATICAL STATISTICS, 1948, 19 (04): : 546 - 557
  • [7] A CLASS OF STATISTICS WITH ASYMPTOTICALLY NORMAL DISTRIBUTION
    HOEFFDING, W
    [J]. ANNALS OF MATHEMATICAL STATISTICS, 1948, 19 (03): : 293 - 325
  • [8] Hulten G, 2001, P 7 ACM SIGKDD INT C, P97, DOI [10.1145/502512.502529, DOI 10.1145/502512.502529]
  • [9] A new measure of rank correlation
    Kendall, MG
    [J]. BIOMETRIKA, 1938, 30 : 81 - 93
  • [10] Koroljuk V. S., 1994, THEORY U STAT