Distributed Stochastic Proximal Algorithm With Random Reshuffling for Nonsmooth Finite-Sum Optimization

被引：4

作者：

Jiang, Xia ^{[1
,2
]}

Zeng, Xianlin ^{[1
,2
]}

Sun, Jian ^{[1
,2
]}

Chen, Jie ^{[3
,4
,5
]}

Xie, Lihua ^{[6
]}

机构：

[1] Beijing Inst Technol, Sch Automat, Key Lab Intelligent Control & Decis Complex Syst, Beijing, Peoples R China

[2] Beijing Inst Technol, Chongqing Innovat Ctr, Chongqing 401120, Peoples R China

[3] Tongji Univ, Sch Elect & Informat Engn, Shanghai 200082, Peoples R China

[4] Beijing Inst Technol, Beijing Adv Innovat Ctr Intelligent Robots & Syst, Beijing 100081, Peoples R China

[5] Beijing Inst Technol, Key Lab Biomimet Robots & Syst, Minist Educ, Beijing 100081, Peoples R China

[6] Nanyang Technol Univ, Sch Elect & Elect Engn, Singapore 639798, Singapore

来源：

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS | 2024年 / 35卷 / 03期

基金：

中国国家自然科学基金;

关键词：

Distributed optimization; proximal operator; random reshuffling (RR); stochastic algorithm; time-varying graphs; GRADIENT ALGORITHMS; SUBGRADIENT METHODS;

D O I：

10.1109/TNNLS.2022.3201711

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The nonsmooth finite-sum minimization is a fundamental problem in machine learning. This article develops a distributed stochastic proximal-gradient algorithm with random reshuffling to solve the finite-sum minimization over time-varying multiagent networks. The objective function is a sum of differentiable convex functions and nonsmooth regularization. Each agent in the network updates local variables by local information exchange and cooperates to seek an optimal solution. We prove that local variable estimates generated by the proposed algorithm achieve consensus and are attracted to a neighborhood of the optimal solution with an O((1/T)+(1/root T)) convergence rate, where T is the total number of iterations. Finally, some comparative simulations are provided to verify the convergence performance of the proposed algorithm.

引用

页码：4082 / 4096

页数：15

共 45 条

[1]

Abu-Mostafa Yaser S., 2012, Learning from Data: A Short Course: AMLbook

[2]

Ahn K., 2020, P ADV NEUR INF PROC, P1

[3] Decentralized Proximal Gradient Algorithms With Linear Convergence Rates [J].

Alghunaim, Sulaiman A. ;

Ryu, Ernest K. ;

Yuan, Kun ;

Sayed, Ali H. .

IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2021, 66 (06) :2787-2794

[4]

[Anonymous], 2018, 2018 IEEE ACM 26 INT, DOI DOI 10.1109/IWQOS.2018.8624168

[5]

Astorino A, 2007, IEEE T PATTERN ANAL, V29, P2135, DOI [10.1109/TPAMI.2007.1102, 10.1109/TPAMI.2007.1102.]

[6] A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems [J].

Beck, Amir ;

Teboulle, Marc .

SIAM JOURNAL ON IMAGING SCIENCES, 2009, 2 (01) :183-202

[7]

Bertsekas D. P., 2015, ARXIV

[8]

Bertsekas DP, 2012, OPTIMIZATION FOR MACHINE LEARNING, P85

[9] Performance of a Distributed Stochastic Approximation Algorithm [J].

Bianchi, Pascal ;

Fort, Gersende ;

Hachem, Walid .

IEEE TRANSACTIONS ON INFORMATION THEORY, 2013, 59 (11) :7405-7418

[10]

Chen AI, 2012, ANN ALLERTON CONF, P601, DOI 10.1109/Allerton.2012.6483273

← 1 2 3 4 5 →