Bayesian optimization with approximate set kernels

被引：0

作者：

Jungtaek Kim

Michael McCourt

Tackgeun You

Saehoon Kim

Seungjin Choi

机构：

[1] POSTECH,

[2] SigOpt,undefined

[3] an Intel company,undefined

[4] Kakao Brain,undefined

[5] BARO AI,undefined

来源：

Machine Learning | 2021年 / 110卷

关键词：

Global optimization; Bayesian optimization; Set optimization;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

We propose a practical Bayesian optimization method over sets, to minimize a black-box function that takes a set as a single input. Because set inputs are permutation-invariant, traditional Gaussian process-based Bayesian optimization strategies which assume vector inputs can fall short. To address this, we develop a Bayesian optimization method with set kernel that is used to build surrogate functions. This kernel accumulates similarity over set elements to enforce permutation-invariance, but this comes at a greater computational cost. To reduce this burden, we propose two key components: (i) a more efficient approximate set kernel which is still positive-definite and is an unbiased estimator of the true set kernel with upper-bounded variance in terms of the number of subsamples, (ii) a constrained acquisition function optimization over sets, which uses symmetry of the feasible region that defines a set input. Finally, we present several numerical experiments which demonstrate that our method outperforms other methods.

引用

页码：857 / 879

页数：22

共 45 条

[1]

Dempster AP(1977)Maximum likelihood from incomplete data via the EM algorithm Journal of the Royal Statistical Society B 39 1-38

[2]

Laird NM(2001)Classes of kernels for machine learning: A statistics perspective Journal of Machine Learning Research 2 299-312

[3]

Rubin DB(2012)A kernel two-sample test Journal of Machine Learning Research 13 723-773

[4]

Genton MG(2019)Creating glasswing butterfly-inspired durable antifogging superomniphobic supertransmissive, superclear nanostructured glass through Bayesian learning and optimization Materials Horizons 6 1632-1642

[5]

Gretton A(1993)Lipschitzian optimization without the Lipschitz constant Journal of Optimization Theory and Applications 79 157-181

[6]

Borgwardt KM(1989)On the limited memory BFGS method for large scale optimization Mathematical Programming 45 503-528

[7]

Rasch MJ(1982)Least squares quantization in PCM IEEE Transactions on Information Theory 28 129-137

[8]

Schölkopf B(2017)Kernel mean embedding of distributions: A review and beyond Foundations and Trends in Machine Learning 10 1-141

[9]

Smola AJ(2011)Scikit-learn: Machine learning in python Journal of Machine Learning Research 12 2825-2830

[10]

Haghanifar S(2017)Poisson random fields for dynamic feature models Journal of Machine Learning Research 18 1-45

← 1 2 3 4 5 →