Solving Stochastic Compositional Optimization is Nearly as Easy as Solving Stochastic Optimization

被引：40

作者：

Chen, Tianyi ^{[1
,2
]}

Sun, Yuejiao ^{[3
]}

Yin, Wotao ^{[3
]}

机构：

[1] Rensselaer Polytech Inst, Dept Elect Comp & Syst Engn, Troy, NY 12180 USA

[2] Rensselaer Polytech Inst, Inst Data Explorat & Applicat, Troy, NY 12180 USA

[3] Univ Calif Los Angeles, Dept Math, Los Angeles, CA 90025 USA

来源：

IEEE TRANSACTIONS ON SIGNAL PROCESSING | 2021年 / 69卷

基金：

美国国家科学基金会;

关键词：

Stochastic processes; Optimization; Task analysis; Convergence; Signal processing algorithms; Gradient methods; Complexity theory; Stochastic approximation; stochastic compositional optimization; adaptive gradients; meta learning;

D O I：

10.1109/TSP.2021.3092377

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Stochastic compositional optimization generalizes classic (non-compositional) stochastic optimization to the minimization of compositions of functions. Each composition may introduce an additional expectation. The series of expectations may be nested. Stochastic compositional optimization is gaining popularity in applications such as reinforcement learning and meta learning. This paper presents a new Stochastically Corrected Stochastic Compositional gradient method (SCSC). SCSC runs in a single-time scale with a single loop, uses a fixed batch size, and guarantees to converge at the same rate as the stochastic gradient descent (SGD) method for non-compositional stochastic optimization. This is achieved by making a careful improvement to a popular stochastic compositional gradient method. It is easy to apply SGD-improvement techniques to accelerate SCSC. This helps SCSC achieve state-of-the-art performance for stochastic compositional optimization. In particular, we apply Adam to SCSC, and the exhibited rate of convergence matches that of the original Adam on non-compositional stochastic optimization. We test SCSC using the model-agnostic meta-learning tasks.

引用

页码：4937 / 4948

页数：12

共 41 条

[31]

Song Xingyou, 2020, P INT C LEARN REPR

[32]

Tutunov R., 2020, ARXIV200203755

[33]

Wang M., 2020, P AM CONTR C VIRT JU

[34]

Wang MD, 2017, J MACH LEARN RES, V18

[35] Stochastic compositional gradient descent: algorithms for minimizing compositions of expected-value functions [J].

Wang, Mengdi ;

Fang, Ethan X. ;

Liu, Han .

MATHEMATICAL PROGRAMMING, 2017, 161 (1-2) :419-449

[36]

Xu Y., 2019, ARXIV191011217

[37]

Yin W., 2020, ARXIV200810847

[38]

Yu Y, 2017, PROCEEDINGS OF THE TWENTY-SIXTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, P3364

[39]

Zhang J., 2019, ICML, V97, P7454

[40]

Zhang JY, 2019, ADV NEUR IN, V32

← 1 2 3 4 5 →