NONPARAMETRIC CLASSIFICATION WITH MISSING DATA

被引:0
作者
Sell, Torben [1 ,2 ]
Berrett, Thomas b. [3 ]
Cannings, Timothy i. [1 ,2 ]
机构
[1] Univ Edinburgh, Sch Math, Edinburgh, Scotland
[2] Univ Edinburgh, Maxwell Inst Math Sci, Edinburgh, Scotland
[3] Univ Warwick, Dept Stat, Coventry, England
基金
英国工程与自然科学研究理事会;
关键词
Missing data; classification; minimax; MINIMAX RATE; DISCRIMINATION;
D O I
10.1214/24-AOS2389
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
We introduce a new nonparametric framework for classification problems in the presence of missing data. The key aspect of our framework is that the regression function decomposes into an anova-type sum of orthogonal functions, of which some (or even many) may be zero. Working under a general missingness setting, which allows features to be missing not at random, our main goal is to derive the minimax rate for the excess risk in this problem. In addition to the decomposition property, the rate depends on parameters that control the tail behaviour of the marginal feature distributions, the smoothness of the regression function and a margin condition. The ambient data dimension does not appear in the minimax rate, which can therefore be faster than in the classical nonparametric setting. We further propose a sifier, based on a careful combination of a k-nearest neighbour algorithm and a thresholding step. The HAM classifier attains the minimax rate up to polylogarithmic factors and numerical experiments further illustrate its utility.
引用
收藏
页码:1178 / 1200
页数:23
相关论文
共 50 条
  • [1] Bayesian Nonparametric Classification for Incomplete Data With a High Missing Rate: an Application to Semiconductor Manufacturing Data
    Park, Sewon
    Lee, Kyeongwon
    Jeong, Da-Eun
    Ko, Heung-Kook
    Lee, Jaeyong
    IEEE TRANSACTIONS ON SEMICONDUCTOR MANUFACTURING, 2023, 36 (02) : 170 - 179
  • [2] On nonparametric classification with missing covariates
    Mojirsheibani, Majid
    Montazeri, Zahra
    JOURNAL OF MULTIVARIATE ANALYSIS, 2007, 98 (05) : 1051 - 1071
  • [3] On classification with nonignorable missing data
    Mojirsheibani, Majid
    JOURNAL OF MULTIVARIATE ANALYSIS, 2021, 184
  • [4] Robust nonparametric estimation with missing data
    Boente, Graciela
    Gonzalez-Manteiga, Wenceslao
    Perez-Gonzalez, Ana
    JOURNAL OF STATISTICAL PLANNING AND INFERENCE, 2009, 139 (02) : 571 - 592
  • [5] Nonparametric mean estimation with missing data
    González-Manteiga, W
    Pérez-González, A
    COMMUNICATIONS IN STATISTICS-THEORY AND METHODS, 2004, 33 (02) : 277 - 303
  • [6] A Wrapper Feature Selection Approach to Classification with Missing Data
    Cao Truong Tran
    Zhang, Mengjie
    Andreae, Peter
    Xue, Bing
    APPLICATIONS OF EVOLUTIONARY COMPUTATION, EVOAPPLICATIONS 2016, PT I, 2016, 9597 : 685 - 700
  • [7] PATTERN CLASSIFICATION FORMULATED AS A MISSING DATA TASK: THE AUDIO GENRE CLASSIFICATION CASE
    Pikrakis, Aggelos
    Kopsinis, Yannis
    Chouvardas, Symeon
    Theodoridis, Sergios
    2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 2026 - 2030
  • [9] Nonparametric spectral analysis with missing data via the EM algorithm
    Wang, YW
    Stoica, P
    Li, J
    Marzetta, TL
    DIGITAL SIGNAL PROCESSING, 2005, 15 (02) : 191 - 206
  • [10] A Nonparametric Test of Missing Completely at Random for Incomplete Multivariate Data
    Li, Jun
    Yu, Yao
    PSYCHOMETRIKA, 2015, 80 (03) : 707 - 726