Variance constrained partial least squares

被引:11
作者
Jiang, Xiubao [1 ]
You, Xinge [1 ]
Yu, Shujian [2 ]
Tao, Dacheng [3 ,4 ]
Chen, C. L. Philip [5 ]
Cheung, Yiu-ming [6 ]
机构
[1] Huazhong Univ Sci & Technol, Sch Elect Informat & Commun, Wuhan 430074, Hubei, Peoples R China
[2] Univ Florida, Dept Elect & Comp Engn, Gainesville, FL USA
[3] Univ Technol Sydney, Ctr Quantum Computat & Intelligent Syst, Sydney, NSW 2007, Australia
[4] Univ Technol Sydney, Fac Engn & Informat Technol, Sydney, NSW 2007, Australia
[5] Univ Macau, Fac Sci & Technol, Macau, Peoples R China
[6] Hong Kong Baptist Univ, Dept Comp Sci, Hong Kong, Hong Kong, Peoples R China
关键词
Partial least squares; Near-infrared spectroscopy; Latent variable; Chemometrics; FEATURE SUBSET-SELECTION; VARIABLE SELECTION; LINEAR-REGRESSION; SIGNAL CORRECTION; PLS; SHRINKAGE; PREDICTION; SPECTRA;
D O I
10.1016/j.chemolab.2015.04.014
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Partial least squares (PLS) regression has achieved desirable performance for modeling the relationship between a set of dependent (response) variables with another set of independent (predictor) variables, especially when the sample size is small relative to the dimension of these variables. In each iteration, PLS finds two latent variables from a set of dependent and independent variables via maximizing the product of three factors: variances of the two latent variables as well as the square of the correlation between these two latent variables. In this paper, we derived the mathematical formulation of the relationship between mean square error (MSE) and these three factors. We find that MSE is not monotonous with the product of the three factors. However, the corresponding optimization problem is difficult to solve if we extract the optimal latent variables directly based on this relationship. To address these problems, a novel multilinear regression model-variance constrained partial least squares (VCPLS) is proposed. In the proposed VCPLS, we find the latent variables via maximizing the product of the variance of latent variable from dependent variables and the square of the correlation between the two latent variables, while constraining the variance of the latent variable from independent variables must be larger than a predetermined threshold. The corresponding optimization problem can be solved computational efficiently, and the latent variables extracted by VCPLS are near-optimal. Compared with classical PLS and it is variants, VCPLS can achieve lower prediction error in the sense of MSE. The experiments are conducted on three near-infrared spectroscopy (NIR) data sets. To demonstrate the applicability of our proposed VCPLS, we also conducted experiments on another data set, which has different characteristics from NIR data. Experimental results verified the superiority of our proposed VCPLS. (C) 2015 Elsevier B.V. All rights reserved.
引用
收藏
页码:60 / 71
页数:12
相关论文
共 59 条
[1]  
[Anonymous], 1976, Journal of Econometrics
[2]  
[Anonymous], 2003, ADV LEARN THEORY MET
[3]  
[Anonymous], 1975, International Perspectives on Mathematical and Statistical Modeling
[4]  
Arenas-Garcia J., 2007, ADV NEURAL INFORM PR, V19, P33
[5]   The peculiar shrinkage properties of partial least squares regression [J].
Butler, NA ;
Denham, MC .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 2000, 62 :585-593
[6]   Determination of total polyphenols content in green tea using FT-NIR spectroscopy and different PLS algorithms [J].
Chen, Quansheng ;
Zhao, Jiewen ;
Liu, Muhua ;
Cai, Jianrong ;
Liu, Jianhua .
JOURNAL OF PHARMACEUTICAL AND BIOMEDICAL ANALYSIS, 2008, 46 (03) :568-573
[7]  
Chen T, 2007, CHEMOMETR INTELL LAB, V87, P59, DOI 10.1016/j.chemolab.2006.09.004
[8]   Bayesian linear regression and variable selection for spectroscopic calibration [J].
Chen, Tao ;
Martin, Elaine .
ANALYTICA CHIMICA ACTA, 2009, 631 (01) :13-21
[9]   PRINCIPAL COVARIATES REGRESSION .1. THEORY [J].
DEJONG, S ;
KIERS, HAL .
CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 1992, 14 (1-3) :155-164
[10]   Naive Bayes for regression [J].
Frank, E ;
Trigg, L ;
Holmes, G ;
Witten, IH .
MACHINE LEARNING, 2000, 41 (01) :5-25