Classification of sparse high-dimensional vectors

被引:24
作者
Ingster, Yuri I. [2 ]
Pouet, Christophe [1 ]
Tsybakov, Alexandre B. [3 ,4 ]
机构
[1] Univ Provence, LATP, F-13453 Marseille 13, France
[2] St Petersburg State Electrotech Univ, St Petersburg 197376, Russia
[3] Univ Paris 06, LPMA, F-75252 Paris 05, France
[4] CREST, Stat Lab, F-92240 Malakoff, France
来源
PHILOSOPHICAL TRANSACTIONS OF THE ROYAL SOCIETY A-MATHEMATICAL PHYSICAL AND ENGINEERING SCIENCES | 2009年 / 367卷 / 1906期
基金
英国工程与自然科学研究理事会;
关键词
Bayes risk; classification boundary; high-dimensional data; optimal classifier; sparse vectors; ACUTE LYMPHOBLASTIC-LEUKEMIA; HIGHER CRITICISM; MIXTURES;
D O I
10.1098/rsta.2009.0156
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
We study the problem of classification of d-dimensional vectors into two classes (one of which is 'pure noise') based on a training sample of size m. The main specific feature is that the dimension d can be very large. We suppose that the difference between the distribution of the population and that of the noise is only in a shift, which is a sparse vector. For Gaussian noise, fixed sample size m, and dimension d that tends to infinity, we obtain the sharp classification boundary, i.e. the necessary and sufficient conditions for the possibility of successful classification. We propose classifiers attaining this boundary. We also give extensions of the result to the case where the sample size m depends on d and satisfies the condition (log m)/log d -> gamma, 0 <= gamma < 1, and to the case of non-Gaussian noise satisfying the Cramer condition.
引用
收藏
页码:4427 / 4448
页数:22
相关论文
共 22 条
  • [1] Estimation and confidence sets for sparse normal mixtures
    Cai, T. Tony
    Jin, Jiashun
    Low, Mark G.
    [J]. ANNALS OF STATISTICS, 2007, 35 (06) : 2421 - 2449
  • [2] Higher criticism for detecting sparse heterogeneous mixtures
    Donoho, D
    Jin, JS
    [J]. ANNALS OF STATISTICS, 2004, 32 (03) : 962 - 994
  • [3] Higher criticism thresholding: Optimal feature selection when useful features are rare and weak
    Donoho, David
    Jin, Jiashun
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2008, 105 (39) : 14790 - 14795
  • [4] Feature selection by higher criticism thresholding achieves the optimal phase diagram
    Donoho, David
    Jin, Jiashun
    [J]. PHILOSOPHICAL TRANSACTIONS OF THE ROYAL SOCIETY A-MATHEMATICAL PHYSICAL AND ENGINEERING SCIENCES, 2009, 367 (1906): : 4449 - 4470
  • [5] HALL P, 2009, INNOVATED HIGHER CRI
  • [6] Theoretical measures of relative performance of classifiers for high dimensional data with small sample sizes
    Hall, Peter
    Pittelkow, Yvonne
    Ghosh, Malay
    [J]. JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 2008, 70 : 159 - 173
  • [7] HAUPT J, 2008, P 42 AS C SIGN SYST, P1727, DOI DOI 10.1109/ACSSC.2008.5074721
  • [8] Ibragimov I. A., 1981, STAT ESTIMATION ASYM, V16
  • [9] Ingster Y. I., 2001, I. Math. Methods Statist, V10, P395
  • [10] Ingster Y. I., 2002, ZAP NAUCHN SEM S PET, V294, p[88, 261]