Minimizing Convex Functionals over Space of Probability Measures via KL Divergence Gradient Flow

被引:0
作者
Yao, Rentian [1 ]
Huang, Linjun [1 ]
Yang, Yun [1 ]
机构
[1] UIUC, Champaign, IL 61820 USA
来源
INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 238 | 2024年 / 238卷
关键词
CONVERGENCE; SMOOTH;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Motivated by the computation of the non-parametric maximum likelihood estimator (NPMLE) and the Bayesian posterior in statistics, this paper explores the problem of convex optimization over the space of all probability distributions. We introduce an implicit scheme, called the implicit KL proximal descent (IKLPD) algorithm, for discretizing a continuous-time gradient flow relative to the Kullback-Leibler (KL) divergence for minimizing a convex target functional. We show that IKLPD converges to a global optimum at a polynomial rate from any initialization; moreover, if the objective functional is strongly convex relative to the KL divergence, for example, when the target functional itself is a KL divergence as in the context of Bayesian posterior computation, IKLPD exhibits globally exponential convergence. Computationally, we propose a numerical method based on normalizing flow to realize IKLPD. Conversely, our numerical method can also be viewed as a new approach that sequentially trains a normalizing flow for minimizing a convex functional with a strong theoretical guarantee.
引用
收藏
页数:31
相关论文
共 61 条
[1]  
[Anonymous], 2014, Convex Optimiza- tion
[2]   STOCHASTIC (APPROXIMATE) PROXIMAL POINT METHODS: CONVERGENCE, OPTIMALITY, AND ADAPTIVITY [J].
Asi, Hilal ;
Duchi, John C. .
SIAM JOURNAL ON OPTIMIZATION, 2019, 29 (03) :2257-2290
[3]  
Aubin-Frankowski P-C., 2022, Advances in Neural Information Processing Systems, V35, P17263
[4]   Uniqueness of the Fisher-Rao metric on the space of smooth densities [J].
Bauer, Martin ;
Bruveris, Martins ;
Michor, Peter W. .
BULLETIN OF THE LONDON MATHEMATICAL SOCIETY, 2016, 48 :499-506
[5]   A Descent Lemma Beyond Lipschitz Gradient Continuity: First-Order Methods Revisited and Applications [J].
Bauschke, Heinz H. ;
Bolte, Jerome ;
Teboulle, Marc .
MATHEMATICS OF OPERATIONS RESEARCH, 2017, 42 (02) :330-348
[6]  
Beck A, 2017, MOS-SIAM SER OPTIMIZ, P1, DOI 10.1137/1.9781611974997
[7]   Incremental proximal methods for large scale convex optimization [J].
Bertsekas, Dimitri P. .
MATHEMATICAL PROGRAMMING, 2011, 129 (02) :163-195
[8]   Uniform Convergence to Equilibrium for Granular Media [J].
Bolley, Francois ;
Gentil, Ivan ;
Guillin, Arnaud .
ARCHIVE FOR RATIONAL MECHANICS AND ANALYSIS, 2013, 208 (02) :429-445
[9]   Convergence to equilibrium in Wasserstein distance for Fokker-Planck equations [J].
Bolley, Francois ;
Gentil, Ivan ;
Guillin, Arnaud .
JOURNAL OF FUNCTIONAL ANALYSIS, 2012, 263 (08) :2430-2457
[10]  
Boyd S., 2014, Stochastic proximal iteration: a non-asymptotic improvement upon stochastic gradient descent