QDA classification of high-dimensional data with rare and weak signals

被引:0
作者
Chen, Hanning [1 ]
Zhao, Qiang [2 ]
Wu, Jingjing [3 ]
机构
[1] Univ Melbourne, Sch Math & Stat, Melbourne, Vic, Australia
[2] Shandong Normal Univ, Sch Math & Stat, Jinan, Shandong, Peoples R China
[3] Univ Calgary, Dept Math & Stat, Calgary, AB T2N 1N4, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
Quadratic discriminant analysis; Two-component mixture models; Rare and weak signals; Classification; Variable selection; Higher Criticism Thresholding;
D O I
10.1007/s11634-023-00576-0
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
This paper addresses the two-class classification problem for data with rare and weak signals, under the modern high-dimension setup p >> n. Considering the two-component mixture of Gaussian features with different random mean vector of rare and weak signals but common covariance matrix (homoscedastic Gaussian), Fan (AS 41:2537-2571, 2013) investigated the optimality of linear discriminant analysis (LDA) and proposed an efficient variable selection and classification procedure. We extend their work by incorporating the more general scenario that the two components have different random covariance matrices with difference of rare and weak signals, in order to assess the effect of difference in covariance matrix on classification. Under this model, we investigated the behaviour of quadratic discriminant analysis (QDA) classifier. In theoretical aspect, we derived the successful and unsuccessful classification regions of QDA. For data of rare signals, variable selection will mostly improve the performance of statistical procedures. Thus in implementation aspect, we proposed a variable selection procedure for QDA based on the Higher Criticism Thresholding (HCT) that was proved efficient for LDA. In addition, we conducted extensive simulation studies to demonstrate the successful and unsuccessful classification regions of QDA and evaluate the effectiveness of the proposed HCT thresholded QDA.
引用
收藏
页码:31 / 65
页数:35
相关论文
共 11 条
[1]   GLOBAL TESTING UNDER SPARSE ALTERNATIVES: ANOVA, MULTIPLE COMPARISONS AND THE HIGHER CRITICISM [J].
Arias-Castro, Ery ;
Candes, Emmanuel J. ;
Plan, Yaniv .
ANNALS OF STATISTICS, 2011, 39 (05) :2533-2556
[2]   DETECTION OF AN ANOMALOUS CLUSTER IN A NETWORK [J].
Arias-Castro, Ery ;
Candes, Emmanuel J. ;
Durand, Arnaud .
ANNALS OF STATISTICS, 2011, 39 (01) :278-304
[3]   Higher criticism thresholding: Optimal feature selection when useful features are rare and weak [J].
Donoho, David ;
Jin, Jiashun .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2008, 105 (39) :14790-14795
[4]   INNOVATED SCALABLE EFFICIENT ESTIMATION IN ULTRA-LARGE GAUSSIAN GRAPHICAL MODELS [J].
Fan, Yingying ;
Lv, Jinchi .
ANNALS OF STATISTICS, 2016, 44 (05) :2098-2126
[5]   INNOVATED INTERACTION SCREENING FOR HIGH-DIMENSIONAL NONLINEAR CLASSIFICATION [J].
Fan, Yingying ;
Kong, Yinfei ;
Li, Daoji ;
Zheng, Zemin .
ANNALS OF STATISTICS, 2015, 43 (03) :1243-1272
[6]   OPTIMAL CLASSIFICATION IN SPARSE GAUSSIAN GRAPHIC MODEL [J].
Fan, Yingying ;
Jin, Jiashun ;
Yao, Zhigang .
ANNALS OF STATISTICS, 2013, 41 (05) :2537-2571
[7]  
Jiang BY, 2018, J MACH LEARN RES, V19
[8]   SPARSE QUADRATIC DISCRIMINANT ANALYSIS FOR HIGH DIMENSIONAL DATA [J].
Li, Quefeng ;
Shao, Jun .
STATISTICA SINICA, 2015, 25 (02) :457-473
[9]   SPARSE LINEAR DISCRIMINANT ANALYSIS BY THRESHOLDING FOR HIGH DIMENSIONAL DATA [J].
Shao, Jun ;
Wang, Yazhen ;
Deng, Xinwei ;
Wang, Sijian .
ANNALS OF STATISTICS, 2011, 39 (02) :1241-1265
[10]   Semiparametric mixture: Continuous scale mixture approach [J].
Xiang, Sijia ;
Yao, Weixin ;
Seo, Byungtae .
COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2016, 103 :413-425