A U-classifier for high-dimensional data under non-normality

被引:1
作者
Ahmad, M. Rauf
Pavlenko, Tatjana
机构
[1] Uppsala Univ, Dept Stat, Uppsala, Sweden
[2] KTH, Royal Inst Technol, Dept Math, Stockholm, Sweden
关键词
Bias-adjusted classifier; High-dimensional classification; U-statistics; LINEAR DISCRIMINANT-ANALYSIS; GENE-EXPRESSION DATA; STATISTICS; MULTICLASS; RULES; TESTS;
D O I
10.1016/j.jmva.2018.05.008
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
A classifier for two or more samples is proposed when the data are high-dimensional and the distributions may be non-normal. The classifier is constructed as a linear combination of two easily computable and interpretable components, the U-component and the P-component. The U-component is a linear combination of U-statistics of bilinear forms of pairwise distinct vectors from independent samples. The P-component, the discriminant score, is a function of the projection of the U-component on the observation to be classified. Together, the two components constitute an inherently bias-adjusted classifier valid for high-dimensional data. The classifier is linear but its linearity does not rest on the assumption of homoscedasticity. Properties of the classifier and its normal limit are given under mild conditions. Misclassification errors and asymptotic properties of their empirical counterparts are discussed. Simulation results are used to show the accuracy of the proposed classifier for small or moderate sample sizes and large dimensions. Applications involving real data sets are also included. (C) 2018 Elsevier Inc. All rights reserved.
引用
收藏
页码:269 / 283
页数:15
相关论文
共 50 条
[41]   Tests for high-dimensional covariance matrices using the theory of U-statistics [J].
Ahmad, M. Rauf ;
von Rosen, D. .
JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION, 2015, 85 (13) :2619-2631
[42]   INFERENCE FOR CHANGE POINTS IN HIGH-DIMENSIONAL DATA VIA SELFNORMALIZATION [J].
Wang, Runmin ;
Zhu, Changbo ;
Volgushev, Stanislav ;
Shao, Xiaofeng .
ANNALS OF STATISTICS, 2022, 50 (02) :781-806
[43]   Model-Free Statistical Inference on High-Dimensional Data [J].
Guo, Xu ;
Li, Runze ;
Zhang, Zhe ;
Zou, Changliang .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2025, 120 (549) :186-197
[44]   Inference for mixed models of ANOVA type with high-dimensional data [J].
Chen, Fei ;
Li, Zaixing ;
Shi, Lei ;
Zhu, Lixing .
JOURNAL OF MULTIVARIATE ANALYSIS, 2015, 133 :382-401
[45]   On inference in high-dimensional logistic regression models with separated data [J].
Lewis, R. M. ;
Battey, H. S. .
BIOMETRIKA, 2024, 111 (03) :989-1011
[47]   Using the two-population genetic algorithm with distance-based k-nearest neighbour voting classifier for high-dimensional data [J].
Lee, Chien-Pang ;
Lin, Wen-Shin .
INTERNATIONAL JOURNAL OF DATA MINING AND BIOINFORMATICS, 2016, 14 (04) :315-331
[48]   High-dimensional variable selection and prediction under competing risks with application to SEER-Medicare linked data [J].
Hou, Jiayi ;
Paravati, Anthony ;
Hou, Jue ;
Xu, Ronghui ;
Murphy, James .
STATISTICS IN MEDICINE, 2018, 37 (24) :3486-3502
[49]   Optimal classifier selection and negative bias in error rate estimation: an empirical study on high-dimensional prediction [J].
Anne-Laure Boulesteix ;
Carolin Strobl .
BMC Medical Research Methodology, 9
[50]   Multidimensional scaling with discrimination coefficients for supervised visualization of high-dimensional data [J].
Berrar, Daniel ;
Ohmayer, Georg .
NEURAL COMPUTING & APPLICATIONS, 2011, 20 (08) :1211-1218