Supervised clustering of variables based on Gram-Schmidt transformation

被引：0

作者：

Liu R. ^{[1
]}

Wang H. ^{[1
,2
]}

Wang S. ^{[1
,3
]}

机构：

[1] School of Economics and Management, Beihang University, Beijing

[2] Beijing Advanced Innovation Center for Big Data and Brain Computing, Beihang University, Beijing

[3] Beijing Key Laboratory of Emergency Support Simulation Technologies for City Operations, Beijing

来源：

Beijing Hangkong Hangtian Daxue Xuebao/Journal of Beijing University of Aeronautics and Astronautics | 2019年 / 45卷 / 10期

基金：

中国国家自然科学基金;

关键词：

Dimension reduction; Gram-Schmidt transformation; High correlation; Regression; Variable clustering;

D O I：

10.13700/j.bh.1001-5965.2019.0050

中图分类号：

学科分类号：

摘要：

In order to study the dimension reduction method of high-dimensional data based on regression model further, and the supervised clustering of variables algorithm based on Gram-Schmidt transformation (SCV-GS) is proposed. SCV-GS uses the key variables selected in turn by the variable screening idea as the clustering center, which is different from the hierarchical variable clustering around latent variables. High correlation among variables is processed based on Gram-Schmidt transformation and the clustering results are obtained. At the same time, combined with the concept of partial least squares, a new criterion for "homogeneity" is proposed to select the optimal clustering parameters. SCV-GS can not only get the variable clustering results quickly, but also identify the most relevant variable groups and in what kind of structure the variables work to influence the response variable. Simulation results show that the calculation speed is significantly improved by SCV-GS, and the estimated regression coefficients corresponding to the latent variables are consistent with the comparison method. Real data analysis shows that SCV-GS performs better in interpretation and prediction. © 2019, Editorial Board of JBUAA. All right reserved.

引用

页码：2003 / 2010

页数：7

共 23 条

[1] Tibshirani R., Regression shrinkage and selection via the lasso: A retrospective, Journal of the Royal Statistical Society: Series B(Statistica Methodology), 73, 3, pp. 273-282, (2011)
[2] Zou H., Hastie T., Regularization and variable selection via the elastic net, Journal of the Royal Statistical Society: Series B(Statistical Methodology), 67, 2, pp. 301-320, (2005)
[3] Fan J.Q., Lv J.C., Sure independence screening for ultrahigh dimensional feature space, Journal of the Royal Statistical Society: Series B(Statistical Methodology), 70, 5, pp. 849-911, (2008)
[4] Wang H.S., Forward regression for ultra-high dimensional variable screening, Journal of the American Statistical Association, 104, 488, pp. 1512-1524, (2009)
[5] Zou H., Hastie T., Tibshirani R., Sparse principal component analysis, Journal of Computational and Graphical Statistics, 15, 2, pp. 265-286, (2006)
[6] Chun H., Keles S., Sparse partial least squares regression for simultaneous dimension reduction and variable selection, Journal of the Royal Statistical Society: Series B(Statistical Methodology), 72, 1, pp. 3-25, (2010)
[7] Chen M.K., Vigneau E., Supervised clustering of variables, Advances in Data Analysis and Classification, 10, 1, pp. 85-101, (2016)
[8] Jolliffe I.T., Discarding variables in a principal component analysis. I: Artificial data, Applied Statistics, 21, 2, pp. 160-173, (1972)
[9] Hastie T., Tibshirani R., Botstein D., Et al., Supervised harvesting of expression trees, Genome Biology, 2, 1, (2001)
[10] Vigneau E., Qannari E., Clustering of variables around latent components, Communications in Statistics-Simulation and Computation, 32, 4, pp. 1131-1150, (2003)

← 1 2 3 →