NONPARAMETRIC CLASSIFICATION WITH MISSING DATA

被引:0
作者
Sell, Torben [1 ,2 ]
Berrett, Thomas b. [3 ]
Cannings, Timothy i. [1 ,2 ]
机构
[1] Univ Edinburgh, Sch Math, Edinburgh, Scotland
[2] Univ Edinburgh, Maxwell Inst Math Sci, Edinburgh, Scotland
[3] Univ Warwick, Dept Stat, Coventry, England
基金
英国工程与自然科学研究理事会;
关键词
Missing data; classification; minimax; MINIMAX RATE; DISCRIMINATION;
D O I
10.1214/24-AOS2389
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
We introduce a new nonparametric framework for classification problems in the presence of missing data. The key aspect of our framework is that the regression function decomposes into an anova-type sum of orthogonal functions, of which some (or even many) may be zero. Working under a general missingness setting, which allows features to be missing not at random, our main goal is to derive the minimax rate for the excess risk in this problem. In addition to the decomposition property, the rate depends on parameters that control the tail behaviour of the marginal feature distributions, the smoothness of the regression function and a margin condition. The ambient data dimension does not appear in the minimax rate, which can therefore be faster than in the classical nonparametric setting. We further propose a sifier, based on a careful combination of a k-nearest neighbour algorithm and a thresholding step. The HAM classifier attains the minimax rate up to polylogarithmic factors and numerical experiments further illustrate its utility.
引用
收藏
页码:1178 / 1200
页数:23
相关论文
共 50 条
  • [21] Nonparametric quantile regression with missing data using local estimating equations
    Wang, Chunyu
    Tian, Maozai
    Tang, Man-Lai
    JOURNAL OF NONPARAMETRIC STATISTICS, 2022, 34 (01) : 164 - 186
  • [22] Impact of missing data imputation methods on gene expression clustering and classification
    de Souto, Marcilio C. P.
    Jaskowiak, Pablo A.
    Costa, Ivan G.
    BMC BIOINFORMATICS, 2015, 16
  • [23] Classification Analysis of Tensor-Based Recovered Missing EEG Data
    Akmal, Muhammad
    Zubair, Syed
    Alquhayz, Hani
    IEEE ACCESS, 2021, 9 : 41745 - 41756
  • [24] ON CLASSIFICATION WITH MISSING DATA USING ROUGH-NEURO-FUZZY SYSTEMS
    Nowicki, Robert K.
    INTERNATIONAL JOURNAL OF APPLIED MATHEMATICS AND COMPUTER SCIENCE, 2010, 20 (01) : 55 - 67
  • [25] Evaluating Imputation Techniques for Missing Data in ADNI: A Patient Classification Study
    Campos, Sergio
    Pizarro, Luis
    Valle, Carlos
    Gray, Katherine R.
    Rueckert, Daniel
    Allende, Hector
    PROGRESS IN PATTERN RECOGNITION, IMAGE ANALYSIS, COMPUTER VISION, AND APPLICATIONS, CIARP 2015, 2015, 9423 : 3 - 10
  • [26] Impact of missing data imputation methods on gene expression clustering and classification
    Marcilio CP de Souto
    Pablo A Jaskowiak
    Ivan G Costa
    BMC Bioinformatics, 16
  • [27] Random Subspace Sampling for Classification with Missing Data
    Cao, Yun-Hao
    Wu, Jian-Xin
    JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 2024, 39 (02) : 472 - 486
  • [28] Imputation of missing data with neural networks for classification
    Choudhury, Suyra Jyoti
    Pal, Nikhil R.
    KNOWLEDGE-BASED SYSTEMS, 2019, 182
  • [29] A Genetic Programming-Based Imputation Method for Classification with Missing Data
    Cao Truong Tran
    Zhang, Mengjie
    Andreae, Peter
    GENETIC PROGRAMMING, EUROGP 2016, 2016, 9594 : 149 - 163
  • [30] Adapting Aerial Root Classifier Missing Data Processor in Data Stream Decision Tree Classification
    Lachiheb, Oussama
    Gouider, Mohamed Salah
    MODEL AND DATA ENGINEERING, MEDI 2014, 2014, 8748 : 92 - 99