Constrained Online Convex Optimization With Feedback Delays

被引：26

作者：

Cao, Xuanyu ^{[1
]}

Zhang, Junshan ^{[2
]}

Poor, H. Vincent ^{[3
]}

机构：

[1] Univ Illinois, Coordinated Sci Lab, Urbana, IL 61801 USA

[2] Arizona State Univ, Sch Elect Comp & Energy Engn, Tempe, AZ 85287 USA

[3] Princeton Univ, Dept Elect Engn, Princeton, NJ 08544 USA

来源：

IEEE TRANSACTIONS ON AUTOMATIC CONTROL | 2021年 / 66卷 / 11期

关键词：

Delays; Optimization; Power systems; Convex functions; Time factors; Benchmark testing; Collaborative work; Bandit feedback; constrained optimization; feedback delay; function feedback; online convex optimization (OCO); REGRET;

D O I：

10.1109/TAC.2020.3030743

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

In this article, we study constrained online convex optimization (OCO) in the presence of feedback delays, where a decision maker chooses sequential actions without knowing the loss functions and constraint functions a priori. The loss/constraint functions vary with time and their feedback information is revealed to the decision maker with delays, which arise in many applications. We first consider the scenario of delayed function feedback, in which the complete information of the loss/constraint functions is revealed to the decision maker with delays. We develop a modified online saddle point algorithm suitable for constrained OCO with feedback delays. Sublinear regret and sublinear constraint violation bounds are established for the algorithm in terms of the delays. In practice, the complete information (functional forms) of the loss/constraint functions may not be revealed to the decision maker. Thus, we further examine the scenario of delayed bandit feedback, where only the values of the loss/constraint functions at two random points close to the chosen action are revealed to the decision maker with delays. A delayed version of the bandit online saddle point algorithm is proposed by utilizing stochastic gradient estimates of the loss/constraint functions based on delayed bandit feedback. We also establish sublinear regret and sublinear constraint violation bounds for this bandit optimization algorithm in terms of the delays. Finally, numerical results for online quadratically constrained quadratic programs are presented to corroborate the efficacy of the proposed algorithms.

引用

页码：5049 / 5064

页数：16

共 24 条

[1]

Agarwal A., 2010, COLT 2010 23 C LEARN, P28

[2]

[Anonymous], 2018, ARXIV180608301

[3]

Cao X., 2018, PROC IEEE INT C COMM, P1

[4] Online Convex Optimization With Time-Varying Constraints and Bandit Feedback [J].

Cao, Xuanyu ;

Liu, K. J. Ray .

IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2019, 64 (07) :2665-2680

[5]

Cesa-Bianchi N., 2013, Advances in Neural Information Processing Systems, P1160

[6] Bandit Convex Optimization for Scalable and Dynamic IoT Management [J].

Chen, Tianyi ;

Giannakis, Georgios B. .

IEEE INTERNET OF THINGS JOURNAL, 2019, 6 (01) :1276-1286

[7] An Online Convex Optimization Approach to Proactive Network Resource Allocation [J].

Chen, Tianyi ;

Ling, Qing ;

Giannakis, Georgios B. .

IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2017, 65 (24) :6350-6364

[8]

Cutkosky A, 2016, ADV NEUR IN, V29

[9]

Flaxman AD, 2005, PROCEEDINGS OF THE SIXTEENTH ANNUAL ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS, P385

[10]

Hazan E., 2016, Found. Trends Optim., V2, P157

← 1 2 3 →