GGD: Grafting Gradient Descent

被引:0
作者
Feng, Yanjing [1 ]
Zhou, Yongdao [1 ]
机构
[1] Nankai Univ, Sch Stat & Data Sci, NITFID, Tianjin 300071, Peoples R China
基金
中国国家自然科学基金;
关键词
stochastic optimization; importance sampling; minibatching; variance reduc- tion; adaptive stepsize method; OPTIMIZATION;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Simple random sampling has been widely used in traditional stochastic optimization algorithms. Although the gradient sampled by simple random sampling is a descent direction in expectation, it may have a relatively high variance which will cause the descent curve wiggling and slow down the optimization process. In this paper, we propose a novel stochastic from minibatching and importance sampling, and provide the convergence results of GGD. We show that the grafting gradient possesses a doubly robust property which ensures that the performance of GGD method is superior to the worse one of SGD with importance sampling method and mini-batch SGD method. Combined with advanced variance reduction techniques such as stochastic variance reduced gradient and adaptive stepsize methods such as Adam, these composite GGD-based methods and their theoretical bounds are provided. The real data studies also show that GGD achieves an intermediate performance among SGD with importance sampling and mini-batch SGD, and outperforms original SGD method. Then the proposed GGD is a better and more robust stochastic optimization framework in practice.
引用
收藏
页数:87
相关论文
共 54 条
[1]  
Allen-Zhu Z, 2016, PR MACH LEARN RES, V48
[2]   Katyusha: The First Direct Acceleration of Stochastic Gradient Methods [J].
Allen-Zhu, Zeyuan .
STOC'17: PROCEEDINGS OF THE 49TH ANNUAL ACM SIGACT SYMPOSIUM ON THEORY OF COMPUTING, 2017, :1200-1205
[3]  
[Anonymous], 1973, Mathematical programming, DOI [10.1007/BF01584660, DOI 10.1007/BF01584660]
[4]  
Blanchard P, 2017, ADV NEUR IN, V30
[5]   Optimization Methods for Large-Scale Machine Learning [J].
Bottou, Leon ;
Curtis, Frank E. ;
Nocedal, Jorge .
SIAM REVIEW, 2018, 60 (02) :223-311
[6]  
Cotter A, 2011, Arxiv, DOI arXiv:1106.4574
[7]  
Csiba D, 2018, J MACH LEARN RES, V19
[8]  
Zeiler MD, 2012, Arxiv, DOI arXiv:1212.5701
[9]  
Defazio A, 2014, ADV NEUR IN, V27
[10]  
Defazio A, 2019, ADV NEUR IN, V32