Non-convex optimization for machine learning

被引:294
作者
Jain P. [1 ]
Kar P. [2 ]
机构
[1] IIT, Kanpur
来源
Foundations and Trends in Machine Learning | 2017年 / 10卷 / 3-4期
关键词
Heuristic methods - Problem solving - Convex optimization - Learning systems - Machine learning - Inference engines - Learning algorithms;
D O I
10.1561/2200000058
中图分类号
学科分类号
摘要
A vast majority of machine learning algorithms train their models and perform inference by solving optimization problems. In order to capture the learning and prediction problems accurately structural constraints such as sparsity or low rank are frequently imposed or else the objective itself is designed to be a non-convex function. This is especially true of algorithms that operate in high-dimensional spaces or that train non-linear models such as tensor models and deep networks. The freedom to express the learning problem as a non-convex optimization problem gives immense modeling power to the algorithm designer, but often such problems are NP-hard to solve. A popular workaround to this has been to relax non-convex problems to convex ones and use traditional methods to solve the (convex) relaxed optimization problems. However this approach may be lossy and nevertheless presents significant challenges for large scale optimization. On the other hand, direct approaches to non-convex optimization have met with resounding success in several domains and remain the methods of choice for the practitioner, as they frequently outperform relaxation-based techniques - popular heuristics include projected gradient descent and alternating minimization. However, these are often poorly understood in terms of their convergence and other properties. This monograph presents a selection of recent advances that bridge a long-standing gap in our understanding of these heuristics. We hope that an insight into the inner workings of these methods will allow the reader to appreciate the unique marriage of task structure and generative models that allow these heuristic techniques to (provably) succeed. The monograph will lead the reader through several widely used non-convex optimization techniques, as well as applications thereof. The goal of this monograph is to both, introduce the rich literature in this area, as well as equip the reader with the tools and techniques needed to analyze these simple procedures for non-convex problems. © 2017 P. Jain and P. Kar.
引用
收藏
页码:142 / 336
页数:194
相关论文
共 132 条
[1]  
Agarwal A., Negahban S.N., Wainwright M.J., Fast global convergence of gradient methods for high-dimensional statistical recovery, The Annals of Statistics, 40, 5, pp. 2452-2482, (2012)
[2]  
Agarwal A., Anandkumar A., Jain P., Netrapalli P., Learning sparsely used overcomplete dictionaries via alternating minimization, SIAM Journal of Optimization, 26, 4, pp. 2775-2799, (2016)
[3]  
Agarwal N., Allen-Zhu Z., Bullins B., Hazan E., Ma T., Finding approximate local minima faster than gradient descent, Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing (STOC), (2017)
[4]  
Anandkumar A., Ge R., Efficient approaches for escaping higher order saddle points in non-convex optimization, Proceedings of the 29th Conference on Learning Theory (COLT), pp. 81-102, (2016)
[5]  
Anandkumar A., Ge R., Hsu D., Kakade S.M., Telgarsky M., Tensor decompositions for learning latent variable models, Journal of Machine Learning Research, 15, pp. 2773-2832, (2014)
[6]  
Andresen A., Spokoiny V., Convergence of an alternating maximization procedure, Journal of Machine Learning Research, 17, pp. 1-53, (2016)
[7]  
Arora S., Ge R., Moitra A., New algorithms for learning incoherent and overcomplete dictionaries, Proceedings of the 27th Conference on Learning Theory (COLT), (2014)
[8]  
Azizzadenesheli K., Lazaric A., Anandkumar A., Reinforcement learning of POMDPs using spectral methods, Proceedings of the 29th Conference on Learning Theory (COLT), (2016)
[9]  
Balakrishnan S., Wainwright M.J., Yu B., Statistical guarantees for the EM algorithm: From population to sample-based analysis, Annals of Statistics, 45, 1, pp. 77-120, (2017)
[10]  
Baraniuk R., Davenport M., DeVore R., Wakin M., A simple proof of the restricted isometry property for random matrices, Constructive Approximation, 28, 3, pp. 253-263, (2008)