Partial least squares fusing unsupervised learning

被引:7
作者
Yoo, Jae Keun [1 ]
机构
[1] Ewha Womans Univ, Dept Stat, Seoul 03760, South Korea
基金
新加坡国家研究基金会;
关键词
Cluster analysis; Fused approach; Large p small n; Multivariate analysis; Partial least squares; Unsupervised learning; SUFFICIENT DIMENSION REDUCTION; REGRESSION;
D O I
10.1016/j.chemolab.2017.12.016
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, partial least squares to fuse unsupervised learning, called fused clustered least squares (FCLS), is proposed. As an unsupervised method, the K-means clustering algorithm is adopted, and it clusters either the original predictors or its principal components. This unsupervised learning procedure has a function to discover unknown structures of the predictors, and this information is utilized in their further reduction. Within each cluster, the covariance of the response and the predictors is computed and successively projected onto the covariance matrix of the predictors. This is called clustered least squares. Then we fuse all clustered least squares from the various numbers of clusters. The FCLS is basically implemented by combining supervised and unsupervised statistical methods, and it overcomes the deficits that the ordinary least squares, including its popular variation of partial least squares, have in practice. Numerical studies support the theory, and its application to near infrared spectroscopy data confirms the potential advantage of FCLS in practice.
引用
收藏
页码:82 / 86
页数:5
相关论文
共 13 条
[1]  
[Anonymous], 2017, USER FRIENDLY GUIDE
[2]   Dimension reduction in regression without matrix inversion [J].
Cook, Dennis ;
Li, Bing ;
Chiaromonte, Francesca .
BIOMETRIKA, 2007, 94 (03) :569-584
[3]  
Ding C., 2004, P 21 INT C MACH LEAR, P29, DOI DOI 10.1145/1015330.1015408
[4]  
HELLAND IS, 1990, SCAND J STAT, V17, P97
[5]   High-throughput data dimension reduction via seeded canonical correlation analysis [J].
Im, Yunju ;
Gang, HeyIn ;
Yoo, Jae Keun .
JOURNAL OF CHEMOMETRICS, 2015, 29 (03) :193-199
[6]   Leukemia and small round blue-cell tumor cancer detection using microarray gene expression data set: Combining data dimension reduction and variable selection technique [J].
Karimi, Sadegh ;
Farrokhnia, Maryam .
CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2014, 139 :6-14
[7]   Cluster-based estimation for sufficient dimension reduction [J].
Li, LX ;
Cook, RD ;
Nachtsheim, CJ .
COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2004, 47 (01) :175-193
[8]   Determination of glucose and ethanol in bioethanol production by near infrared spectroscopy and chemometrics [J].
Liebmann, B. ;
Friedl, A. ;
Varmuza, K. .
ANALYTICA CHIMICA ACTA, 2009, 642 (1-2) :171-178
[9]   The equivalence of partial least squares and principal component regression in the sufficient dimension reduction framework [J].
Lin, You-Wu ;
Deng, Bai-Chuan ;
Xu, Qing-Song ;
Yun, Yong-Huan ;
Liang, Yi-Zeng .
CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2016, 150 :58-64