NONPARAMETRIC CLASSIFICATION WITH MISSING DATA

被引:0
作者
Sell, Torben [1 ,2 ]
Berrett, Thomas b. [3 ]
Cannings, Timothy i. [1 ,2 ]
机构
[1] Univ Edinburgh, Sch Math, Edinburgh, Scotland
[2] Univ Edinburgh, Maxwell Inst Math Sci, Edinburgh, Scotland
[3] Univ Warwick, Dept Stat, Coventry, England
基金
英国工程与自然科学研究理事会;
关键词
Missing data; classification; minimax; MINIMAX RATE; DISCRIMINATION;
D O I
10.1214/24-AOS2389
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
We introduce a new nonparametric framework for classification problems in the presence of missing data. The key aspect of our framework is that the regression function decomposes into an anova-type sum of orthogonal functions, of which some (or even many) may be zero. Working under a general missingness setting, which allows features to be missing not at random, our main goal is to derive the minimax rate for the excess risk in this problem. In addition to the decomposition property, the rate depends on parameters that control the tail behaviour of the marginal feature distributions, the smoothness of the regression function and a margin condition. The ambient data dimension does not appear in the minimax rate, which can therefore be faster than in the classical nonparametric setting. We further propose a sifier, based on a careful combination of a k-nearest neighbour algorithm and a thresholding step. The HAM classifier attains the minimax rate up to polylogarithmic factors and numerical experiments further illustrate its utility.
引用
收藏
页码:1178 / 1200
页数:23
相关论文
共 50 条
  • [31] Traffic congestion prediction and missing data: a classification approach using weather information
    Mystakidis, Aristeidis
    Tjortjis, Christos
    INTERNATIONAL JOURNAL OF DATA SCIENCE AND ANALYTICS, 2024,
  • [32] Strongly universally consistent nonparametric regression and classification with privatised data
    Berrett, Thomas B.
    Gyorfi, Laszlo
    Walk, Harro
    ELECTRONIC JOURNAL OF STATISTICS, 2021, 15 (01): : 2430 - 2453
  • [33] Missing data techniques in classification for cardiovascular dysautonomias diagnosis
    Ali Idri
    Ilham Kadi
    Ibtissam Abnane
    José Luis Fernandez-Aleman
    Medical & Biological Engineering & Computing, 2020, 58 : 2863 - 2878
  • [34] Missing data techniques in classification for cardiovascular dysautonomias diagnosis
    Idri, Ali
    Kadi, Ilham
    Abnane, Ibtissam
    Fernandez-Aleman, Jose Luis
    MEDICAL & BIOLOGICAL ENGINEERING & COMPUTING, 2020, 58 (11) : 2863 - 2878
  • [35] Visualization of the critical patterns of missing values in classification data
    Wang, Hai
    Wang, Shouhong
    ADVANCES IN VISUAL INFORMATION SYSTEMS, 2007, 4781 : 267 - +
  • [36] Kernel classification with missing data and the choice of smoothing parameters
    Demirdjian, Levon
    Mojirsheibani, Majid
    STATISTICAL PAPERS, 2019, 60 (05) : 1487 - 1513
  • [37] Kernel classification with missing data and the choice of smoothing parameters
    Levon Demirdjian
    Majid Mojirsheibani
    Statistical Papers, 2019, 60 : 1487 - 1513
  • [38] On regression and classification with possibly missing response variables in the data
    Mojirsheibani, Majid
    Pouliot, William
    Shakhbandaryan, Andre
    METRIKA, 2024, 87 (06) : 607 - 648
  • [39] Pattern classification with missing data using belief functions
    Liu, Zhun-ga
    Pan, Quan
    Mercier, Gregoire
    Dezert, Jean
    2014 17TH INTERNATIONAL CONFERENCE ON INFORMATION FUSION (FUSION), 2014,
  • [40] Missing data imputation using classification and regression trees
    Chen, Cheng-Yang
    Chang, Yu-Wei
    PEERJ COMPUTER SCIENCE, 2024, 10