Convex-Concave Programming: An Effective Alternative for Optimizing Shallow Neural Networks

被引:0
作者
Askarizadeh, Mohammad [1 ]
Morsali, Alireza [2 ]
Tofigh, Sadegh [1 ]
Nguyen, Kim Khoa [1 ]
机构
[1] Univ Quebec, Ecole Technol Super, Dept Elect & Comp Engn, Montreal, PQ H3C 1K3, Canada
[2] Aibrary, R&D, Burnaby, BC V5C 3C5, Canada
来源
IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE | 2025年 / 9卷 / 04期
基金
加拿大自然科学与工程研究理事会;
关键词
Neural network; difference convex function; convex-concave algorithm; convergence rate; complexity analysis; FEEDFORWARD NETWORKS; APPROXIMATION; OPTIMIZATION; ALGORITHMS;
D O I
10.1109/TETCI.2024.3502463
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this study, we address the challenges of non-convex optimization in neural networks (NNs) by formulating the training of multilayer perceptron (MLP) NNs as a difference of convex functions (DC) problem. Utilizing the basic convex-concave algorithm to solve our DC problems, we introduce two alternative optimization techniques, DC-GD and DC-OPT , for determining MLP parameters. By leveraging the non-uniqueness property of the convex components in DC functions, we generate strongly convex components for the DC NN cost function. This strong convexity enables our proposed algorithms, DC-GD and DC-OPT , to achieve an iteration complexity of O(log(1/epsilon)) , surpassing that of other solvers, such as stochastic gradient descent ( SGD ), which has an iteration complexity of O(1/epsilon) . This improvement raises the convergence rate from sublinear ( SGD ) to linear (ours) while maintaining comparable total computational costs . Furthermore, conventional NN optimizers like SGD , RMSprop , and Adam are highly sensitive to the learning rate, adding computational overhead for practitioners in selecting an appropriate learning rate. In contrast, our DC-OPT algorithm is hyperparameter-free (i.e., it requires no learning rate), and our DC-GD algorithm is less sensitive to the learning rate, offering comparable accuracy to other solvers. Additionally, we extend our approach to a convolutional NN architecture, demonstrating its applicability to modern NNs. We evaluate the performance of our proposed algorithms by comparing them to conventional optimizers such as SGD , RMSprop , and Adam across various test cases. The results suggest that our approach is a viable alternative for optimizing shallow MLP NNs.
引用
收藏
页码:2894 / 2907
页数:14
相关论文
共 63 条
[1]   On the Rate of Convergence of the Difference-of-Convex Algorithm (DCA) [J].
Abbaszadehpeivasti, Hadi ;
de Klerk, Etienne ;
Zamani, Moslem .
JOURNAL OF OPTIMIZATION THEORY AND APPLICATIONS, 2024, 202 (01) :475-496
[2]   DC decomposition of nonconvex polynomials with algebraic techniques [J].
Ahmadi, Amir Ali ;
Hall, Georgina .
MATHEMATICAL PROGRAMMING, 2018, 169 (01) :69-94
[3]  
Allen-Zhu Z, 2019, PR MACH LEARN RES, V97
[4]  
Amos Brandon, 2017, P MACHINE LEARNING R, V70
[5]   A branch and bound method via dc optimization algorithms and ellipsoidal technique for box constrained nonconvex quadratic problems [J].
An, LTH ;
Tao, PD .
JOURNAL OF GLOBAL OPTIMIZATION, 1998, 13 (02) :171-206
[6]  
Argyriou A., 2006, International Conference on Machine Learning, P41
[7]   RISC-V Barrel Processor for Deep Neural Network Acceleration [J].
AskariHemmat, MohammadHossein ;
Bilaniuk, Olexa ;
Wagner, Sean ;
Savaria, Yvon ;
David, Jean-Pierre .
2021 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), 2021,
[8]   Difference Convex (DC) Programming Approach as an Alternative Optimizer for Neural Networks [J].
Askarizadeh, Mohammad ;
Morsali, Alireza ;
Zangiabadi, Mostafa ;
Nguyen, Kim Khoa .
ICC 2023-IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS, 2023, :5179-5184
[9]   DC-programming for neural network optimizations [J].
Awasthi, Pranjal ;
Mao, Anqi ;
Mohri, Mehryar ;
Zhong, Yutao .
JOURNAL OF GLOBAL OPTIMIZATION, 2024,
[10]   Efficient Global Optimization of Two-Layer ReLU Networks: Quadratic-Time Algorithms and Adversarial Training [J].
Bai, Yatong ;
Gautam, Tanmay ;
Sojoudi, Somayeh .
SIAM JOURNAL ON MATHEMATICS OF DATA SCIENCE, 2023, 5 (02) :446-474