Asymptotic properties of distance-weighted discrimination and its bias correction for high-dimension, low-sample-size data

被引:0
作者
Kento Egashira
Kazuyoshi Yata
Makoto Aoshima
机构
[1] University of Tsukuba,Degree Programs in Pure and Applied Sciences, Graduate School of Science and Technology
[2] University of Tsukuba,Institute of Mathematics
来源
Japanese Journal of Statistics and Data Science | 2021年 / 4卷
关键词
Bias-corrected DWD; Discriminant analysis; HDLSS; Large ; small ; Weighted DWD;
D O I
暂无
中图分类号
学科分类号
摘要
While distance-weighted discrimination (DWD) was proposed to improve the support vector machine in high-dimensional settings, it is known that the DWD is quite sensitive to the imbalanced ratio of sample sizes. In this paper, we study asymptotic properties of the DWD in high-dimension, low-sample-size (HDLSS) settings. We show that the DWD includes a huge bias caused by a heterogeneity of covariance matrices as well as sample imbalance. We propose a bias-corrected DWD (BC-DWD) and show that the BC-DWD can enjoy consistency properties about misclassification rates. We also consider the weighted DWD (WDWD) and propose an optimal choice of weights in the WDWD. Finally, we discuss performances of the BC-DWD and the WDWD with the optimal weights in numerical simulations and actual data analyses.
引用
收藏
页码:821 / 840
页数:19
相关论文
共 43 条
  • [1] Alon U(1999)Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays Proceedings of the National Academy of Sciences of the United States of America 96 6745-6750
  • [2] Barkai N(2011)Two-stage procedures for high-dimensional data Sequential Analysis (Editor’s Special Invited Paper) 30 356-399
  • [3] Notterman DA(2014)A distance-based, misclassification rate adjusted classifier for multiclass, high-dimensional data Annals of the Institute of Statistical Mathematics 66 983-1010
  • [4] Gish K(2015)Geometric classifier for multiclass, high-dimensional data Sequential Analysis 34 279-294
  • [5] Ybarra S(2018)Two-sample tests for high-dimension, strongly spiked eigenvalue models Statistica Sinica 28 43-62
  • [6] Mack D(2019)Distance-based classifier by data transformation for high-dimension, strongly spiked eigenvalue models Annals of the Institute of Statistical Mathematics 71 473-503
  • [7] Levine AJ(2019)High-dimensional quadratic classifiers in non-sparse settings Methodology and Computing in Applied Probability 21 663-682
  • [8] Aoshima M(2009)Scale adjustments for classifiers in high-dimensional, low sample size settings Biometrika 96 469-478
  • [9] Yata K(2005)Geometric representation of high dimension, low sample size data Journal of the Royal Statistical Society, Series B 67 427-444
  • [10] Aoshima M(2008)Theoretical measures of relative performance of classifiers for high dimensional data with small sample sizes Journal of the Royal Statistical Society, Series B 70 159-173