Online Convex Optimization With Time-Varying Constraints and Bandit Feedback

被引：69

作者：

Cao, Xuanyu ^{[1
]}

Liu, K. J. Ray ^{[2
]}

机构：

[1] Princeton Univ, Dept Elect Engn, Princeton, NJ 08544 USA

[2] Univ Maryland, Dept Elect & Comp Engn, College Pk, MD 20742 USA

来源：

IEEE TRANSACTIONS ON AUTOMATIC CONTROL | 2019年 / 64卷 / 07期

关键词：

Bandit feedback; constrained optimization; online convex optimization (OCO); stochastic optimization; ALGORITHMS; REGRET;

D O I：

10.1109/TAC.2018.2884653

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

In this paper, online convex optimization problem with time-varying constraints is studied from the perspective of an agent taking sequential actions. Both the objective function and the constraint functions are dynamic and unknown a priori to the agent. We first consider the scenario of the gradient feedback, in which, the values and gradients of the objective function and constraint functions at the chosen action are revealed after an action is submitted. We propose a computationally efficient online algorithm, which only involves direct closed-form computations at each time instant. It is shown that the algorithm possesses sublinear regret with respect to the dynamic benchmark sequence and sublinear constraint violations, as long as the drift of the benchmark sequence is sublinear, or in other words, the underlying dynamic optimization problems do not vary too drastically. Furthermore, we investigate the scenario of the bandit feedback, in which, after an action is chosen, only the values of the objective function and the constraint functions at several random points close to the action are announced to the agent. A bandit version of the online algorithm is proposed and we also establish its sublinear expected regret and sublinear expected constraint violations under the assumption that the drift of the benchmark sequence is sublinear. Finally, two numerical examples, namely online quadratic programming and online logistic regression, are presented to corroborate the effectiveness of the proposed algorithms and to confirm the theoretical guarantees.

引用

页码：2665 / 2680

页数：16

共 29 条

[1] Interior-Point Methods for Full-Information and Bandit Online Learning
Abernethy, Jacob D.
Hazan, Elad
Rakhlin, Alexander
[J]. IEEE TRANSACTIONS ON INFORMATION THEORY, 2012, 58 (07) : 4164 - 4175
[2] Agarwal A, 2010, Colt, P28
[3] [Anonymous], 1999, NONLINEAR PROGRAMMIN
[4] [Anonymous], 2013, SIGMETRICS PERFORM E
[5] Non-Stationary Stochastic Optimization
Besbes, Omar
Gur, Yonatan
Zeevi, Assaf
[J]. OPERATIONS RESEARCH, 2015, 63 (05) : 1227 - 1244
[6] Boyd Stephan, 2004, CONVEX OPTIMIZATION
[7] Chen C, 2014, 2014 INTERNATIONAL CONFERENCE ON MECHATRONICS AND CONTROL (ICMC), P11, DOI 10.1109/ICMC.2014.7231506
[8] Chen NJ, 2016, SIGMETRICS/PERFORMANCE 2016: PROCEEDINGS OF THE SIGMETRICS/PERFORMANCE JOINT INTERNATIONAL CONFERENCE ON MEASUREMENT AND MODELING OF COMPUTER SCIENCE, P193, DOI [10.1145/2896377.2901464, 10.1145/2964791.2901464]
[9] An Online Convex Optimization Approach to Proactive Network Resource Allocation
Chen, Tianyi
Ling, Qing
Giannakis, Georgios B.
[J]. IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2017, 65 (24) : 6350 - 6364
[10] Prediction-Correction Interior-Point Method for Time-Varying Convex Optimization
Fazlyab, Mahyar
Paternain, Santiago
Preciado, Victor M.
Ribeiro, Alejandro
[J]. IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2018, 63 (07) : 1973 - 1986

← 1 2 3 →