EFFICIENT GAUSSIAN PROCESS MODELING USING EXPERIMENTAL DESIGN-BASED SUBAGGING

被引:14
作者
Zhao, Yibo [1 ]
Amemiya, Yasuo [2 ]
Hung, Ying [1 ]
机构
[1] Rutgers State Univ, Dept Stat & Biostat, Piscataway, NJ 08854 USA
[2] IBM TJ Watson Res Ctr, Stat Anal & Forecasting, Yorktown Hts, NY 10598 USA
关键词
Bagging; computer experiment; experimental design; Gaussian process; Latin hypercube design; model selection; MAXIMUM-LIKELIHOOD-ESTIMATION; NONCONCAVE PENALIZED LIKELIHOOD; VARIABLE SELECTION; BLOCK BOOTSTRAP; COVARIANCE; APPROXIMATION; EMULATORS;
D O I
10.5705/ss.202016.0250
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
We address two important issues in Gaussian process (GP) modeling. One is how to reduce the computational complexity in GP modeling and the other is how to simultaneous perform variable selection and estimation for the mean function of GP models. Estimation is computationally intensive for GP models because it heavily involves manipulations of an n-by-n correlation matrix, where n is the sample size. Conventional penalized likelihood approaches are widely used for variable selection. However the computational cost of the penalized likelihood estimation (PMLE) or the corresponding one-step sparse estimation (OSE) can be prohibitively high as the sample size becomes large, especially for GP models. To address both issues, this article proposes an efficient subsample aggregating (sub-agging) approach with an experimental design-based subsampling scheme. The proposed method is computationally cheaper, yet it can be shown that the resulting subagging estimators achieve the same efficiency as the original PMLE and OSE asymptotically. The finite-sample performance is examined through simulation studies. Application of the proposed methodology to a data center thermal study reveals some interesting information, including identifying an efficient cooling mechanism.
引用
收藏
页码:1459 / 1479
页数:21
相关论文
共 63 条
  • [1] [Anonymous], 1999, INTERPOLATION SPATIA
  • [2] Stationary process approximation for the analysis of large spatial datasets
    Banerjee, Sudipto
    Gelfand, Alan E.
    Finley, Andrew O.
    Sang, Huiyan
    [J]. JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 2008, 70 : 825 - 848
  • [3] Random forests
    Breiman, L
    [J]. MACHINE LEARNING, 2001, 45 (01) : 5 - 32
  • [4] Bühlmann P, 2002, ANN STAT, V30, P927
  • [5] PENALIZED MAXIMUM LIKELIHOOD ESTIMATION AND VARIABLE SELECTION IN GEOSTATISTICS
    Chu, Tingjin
    Zhu, Jun
    Wang, Haonan
    [J]. ANNALS OF STATISTICS, 2011, 39 (05) : 2607 - 2625
  • [6] Cressie NAC., 1993, STAT SPATIAL DATA, DOI [10.1002/9781119115151, DOI 10.1002/9781119115151]
  • [7] Fixed rank kriging for very large spatial data sets
    Cressie, Noel
    Johannesson, Gardar
    [J]. JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 2008, 70 : 209 - 226
  • [8] DESIGN FOR COMPUTER EXPERIMENTS WITH QUALITATIVE AND QUANTITATIVE FACTORS
    Deng, Xinwei
    Hung, Ying
    Lin, C. Devon
    [J]. STATISTICA SINICA, 2015, 25 (04) : 1567 - 1581
  • [9] IDEAL SPATIAL ADAPTATION BY WAVELET SHRINKAGE
    DONOHO, DL
    JOHNSTONE, IM
    [J]. BIOMETRIKA, 1994, 81 (03) : 425 - 455
  • [10] Noncollapsing Space-Filling Designs for Bounded Nonrectangular Regions
    Draguljic, Danel
    Santner, Thomas J.
    Dean, Angela M.
    [J]. TECHNOMETRICS, 2012, 54 (02) : 169 - 178