Bayesian optimization with approximate set kernels

被引:0
作者
Jungtaek Kim
Michael McCourt
Tackgeun You
Saehoon Kim
Seungjin Choi
机构
[1] POSTECH,
[2] SigOpt,undefined
[3] an Intel company,undefined
[4] Kakao Brain,undefined
[5] BARO AI,undefined
来源
Machine Learning | 2021年 / 110卷
关键词
Global optimization; Bayesian optimization; Set optimization;
D O I
暂无
中图分类号
学科分类号
摘要
We propose a practical Bayesian optimization method over sets, to minimize a black-box function that takes a set as a single input. Because set inputs are permutation-invariant, traditional Gaussian process-based Bayesian optimization strategies which assume vector inputs can fall short. To address this, we develop a Bayesian optimization method with set kernel that is used to build surrogate functions. This kernel accumulates similarity over set elements to enforce permutation-invariance, but this comes at a greater computational cost. To reduce this burden, we propose two key components: (i) a more efficient approximate set kernel which is still positive-definite and is an unbiased estimator of the true set kernel with upper-bounded variance in terms of the number of subsamples, (ii) a constrained acquisition function optimization over sets, which uses symmetry of the feasible region that defines a set input. Finally, we present several numerical experiments which demonstrate that our method outperforms other methods.
引用
收藏
页码:857 / 879
页数:22
相关论文
共 45 条
[1]  
Dempster AP(1977)Maximum likelihood from incomplete data via the EM algorithm Journal of the Royal Statistical Society B 39 1-38
[2]  
Laird NM(2001)Classes of kernels for machine learning: A statistics perspective Journal of Machine Learning Research 2 299-312
[3]  
Rubin DB(2012)A kernel two-sample test Journal of Machine Learning Research 13 723-773
[4]  
Genton MG(2019)Creating glasswing butterfly-inspired durable antifogging superomniphobic supertransmissive, superclear nanostructured glass through Bayesian learning and optimization Materials Horizons 6 1632-1642
[5]  
Gretton A(1993)Lipschitzian optimization without the Lipschitz constant Journal of Optimization Theory and Applications 79 157-181
[6]  
Borgwardt KM(1989)On the limited memory BFGS method for large scale optimization Mathematical Programming 45 503-528
[7]  
Rasch MJ(1982)Least squares quantization in PCM IEEE Transactions on Information Theory 28 129-137
[8]  
Schölkopf B(2017)Kernel mean embedding of distributions: A review and beyond Foundations and Trends in Machine Learning 10 1-141
[9]  
Smola AJ(2011)Scikit-learn: Machine learning in python Journal of Machine Learning Research 12 2825-2830
[10]  
Haghanifar S(2017)Poisson random fields for dynamic feature models Journal of Machine Learning Research 18 1-45