Modified linear discriminant analysis using block covariance matrix in high-dimensional data

被引:1
作者
Nam, Jin Hyun [1 ]
Kim, Donguk [1 ]
机构
[1] Sungkyunkwan Univ, Dept Stat, Seoul, South Korea
关键词
lock covariance matrix; Classification; High-dimension; Modified LDA; SUPPORT VECTOR MACHINES; GENE-EXPRESSION DATA; PROSTATE-CANCER; CLASSIFICATION; PREDICTION;
D O I
10.1080/03610918.2015.1014103
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Among many classification methods, linear discriminant analysis (LDA) is a favored tool due to its simplicity, robustness, and predictive accuracy but when the number of genes is larger than the number of observations, it cannot be applied directly because the within-class covariance matrix is singular. Also, diagonal LDA (DLDA) is a simpler model compared to LDA and has better performance in some cases. However, in reality, DLDA requires a strong assumption based on mutual independence. In this article, we propose the modified LDA (MLDA). MLDA is based on independence, but uses the information that has an effect on classification performance with the dependence structure. We suggest two approaches. One is the case of using gene rank. The other involves no use of gene rank. We found that MLDA has better performance than LDA, DLDA, or K-nearest neighborhood and is comparable with support vector machines in real data analysis and the simulation study.
引用
收藏
页码:1796 / 1807
页数:12
相关论文
共 18 条
[1]   Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling [J].
Alizadeh, AA ;
Eisen, MB ;
Davis, RE ;
Ma, C ;
Lossos, IS ;
Rosenwald, A ;
Boldrick, JG ;
Sabet, H ;
Tran, T ;
Yu, X ;
Powell, JI ;
Yang, LM ;
Marti, GE ;
Moore, T ;
Hudson, J ;
Lu, LS ;
Lewis, DB ;
Tibshirani, R ;
Sherlock, G ;
Chan, WC ;
Greiner, TC ;
Weisenburger, DD ;
Armitage, JO ;
Warnke, R ;
Levy, R ;
Wilson, W ;
Grever, MR ;
Byrd, JC ;
Botstein, D ;
Brown, PO ;
Staudt, LM .
NATURE, 2000, 403 (6769) :503-511
[2]   Some theory for Fisher's linear discriminant function, 'naive Bayes', and some alternatives when there are many more variables than observations [J].
Bickel, PJ ;
Levina, E .
BERNOULLI, 2004, 10 (06) :989-1010
[3]  
Chai H., 2008, P 2 EUR WORKSH DAT M, P7
[4]   Gene expression profile of adult T-cell acute lymphocytic leukemia identifies distinct subsets of patients with different response to therapy and survival [J].
Chiaretti, S ;
Li, XC ;
Gentleman, R ;
Vitale, A ;
Vignetti, M ;
Mandelli, F ;
Ritz, J ;
Foa, R .
BLOOD, 2004, 103 (07) :2771-2778
[5]   Comparison of discrimination methods for the classification of tumors using gene expression data [J].
Dudoit, S ;
Fridlyand, J ;
Speed, TP .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2002, 97 (457) :77-87
[6]   Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring [J].
Golub, TR ;
Slonim, DK ;
Tamayo, P ;
Huard, C ;
Gaasenbeek, M ;
Mesirov, JP ;
Coller, H ;
Loh, ML ;
Downing, JR ;
Caligiuri, MA ;
Bloomfield, CD ;
Lander, ES .
SCIENCE, 1999, 286 (5439) :531-537
[7]   Regularized linear discriminant analysis and its application in microarrays [J].
Guo, Yaqian ;
Hastie, Trevor ;
Tibshirani, Robert .
BIOSTATISTICS, 2007, 8 (01) :86-100
[8]   Gene selection for cancer classification using support vector machines [J].
Guyon, I ;
Weston, J ;
Barnhill, S ;
Vapnik, V .
MACHINE LEARNING, 2002, 46 (1-3) :389-422
[9]   Bias-Corrected Diagonal Discriminant Rules for High-Dimensional Classification [J].
Huang, Song ;
Tong, Tiejun ;
Zhao, Hongyu .
BIOMETRICS, 2010, 66 (04) :1096-1106
[10]   Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks [J].
Khan, J ;
Wei, JS ;
Ringnér, M ;
Saal, LH ;
Ladanyi, M ;
Westermann, F ;
Berthold, F ;
Schwab, M ;
Antonescu, CR ;
Peterson, C ;
Meltzer, PS .
NATURE MEDICINE, 2001, 7 (06) :673-679