EFFICIENT GAUSSIAN PROCESS MODELING USING EXPERIMENTAL DESIGN-BASED SUBAGGING

被引：14

作者：

Zhao, Yibo ^{[1
]}

Amemiya, Yasuo ^{[2
]}

Hung, Ying ^{[1
]}

机构：

[1] Rutgers State Univ, Dept Stat & Biostat, Piscataway, NJ 08854 USA

[2] IBM TJ Watson Res Ctr, Stat Anal & Forecasting, Yorktown Hts, NY 10598 USA

来源：

STATISTICA SINICA | 2018年 / 28卷 / 03期

关键词：

Bagging; computer experiment; experimental design; Gaussian process; Latin hypercube design; model selection; MAXIMUM-LIKELIHOOD-ESTIMATION; NONCONCAVE PENALIZED LIKELIHOOD; VARIABLE SELECTION; BLOCK BOOTSTRAP; COVARIANCE; APPROXIMATION; EMULATORS;

D O I：

10.5705/ss.202016.0250

中图分类号：

O21 [概率论与数理统计]; C8 [统计学];

学科分类号：

020208 ; 070103 ; 0714 ;

摘要：

We address two important issues in Gaussian process (GP) modeling. One is how to reduce the computational complexity in GP modeling and the other is how to simultaneous perform variable selection and estimation for the mean function of GP models. Estimation is computationally intensive for GP models because it heavily involves manipulations of an n-by-n correlation matrix, where n is the sample size. Conventional penalized likelihood approaches are widely used for variable selection. However the computational cost of the penalized likelihood estimation (PMLE) or the corresponding one-step sparse estimation (OSE) can be prohibitively high as the sample size becomes large, especially for GP models. To address both issues, this article proposes an efficient subsample aggregating (sub-agging) approach with an experimental design-based subsampling scheme. The proposed method is computationally cheaper, yet it can be shown that the resulting subagging estimators achieve the same efficiency as the original PMLE and OSE asymptotically. The finite-sample performance is examined through simulation studies. Application of the proposed methodology to a data center thermal study reveals some interesting information, including identifying an efficient cooling mechanism.

引用

页码：1459 / 1479

页数：21

共 63 条

[1] [Anonymous], 1999, INTERPOLATION SPATIA
[2] Stationary process approximation for the analysis of large spatial datasets
Banerjee, Sudipto
Gelfand, Alan E.
Finley, Andrew O.
Sang, Huiyan
[J]. JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 2008, 70 : 825 - 848
[3] Random forests
Breiman, L
[J]. MACHINE LEARNING, 2001, 45 (01) : 5 - 32
[4] Bühlmann P, 2002, ANN STAT, V30, P927
[5] PENALIZED MAXIMUM LIKELIHOOD ESTIMATION AND VARIABLE SELECTION IN GEOSTATISTICS
Chu, Tingjin
Zhu, Jun
Wang, Haonan
[J]. ANNALS OF STATISTICS, 2011, 39 (05) : 2607 - 2625
[6] Cressie NAC., 1993, STAT SPATIAL DATA, DOI [10.1002/9781119115151, DOI 10.1002/9781119115151]
[7] Fixed rank kriging for very large spatial data sets
Cressie, Noel
Johannesson, Gardar
[J]. JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 2008, 70 : 209 - 226
[8] DESIGN FOR COMPUTER EXPERIMENTS WITH QUALITATIVE AND QUANTITATIVE FACTORS
Deng, Xinwei
Hung, Ying
Lin, C. Devon
[J]. STATISTICA SINICA, 2015, 25 (04) : 1567 - 1581
[9] IDEAL SPATIAL ADAPTATION BY WAVELET SHRINKAGE
DONOHO, DL
JOHNSTONE, IM
[J]. BIOMETRIKA, 1994, 81 (03) : 425 - 455
[10] Noncollapsing Space-Filling Designs for Bounded Nonrectangular Regions
Draguljic, Danel
Santner, Thomas J.
Dean, Angela M.
[J]. TECHNOMETRICS, 2012, 54 (02) : 169 - 178

← 1 2 3 4 5 6 7 →