Pegasos: primal estimated sub-gradient solver for SVM

被引:659
作者
Shalev-Shwartz, Shai [2 ]
Singer, Yoram [3 ]
Srebro, Nathan [1 ]
Cotter, Andrew [1 ]
机构
[1] Toyota Technol Inst, Chicago, IL 60637 USA
[2] Hebrew Univ Jerusalem, Sch Comp Sci & Engn, Jerusalem, Israel
[3] Google, Mountain View, CA USA
基金
以色列科学基金会;
关键词
SVM; Stochastic gradient descent; ONLINE;
D O I
10.1007/s10107-010-0420-4
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
We describe and analyze a simple and effective stochastic sub-gradient descent algorithm for solving the optimization problem cast by Support Vector Machines (SVM). We prove that the number of iterations required to obtain a solution of accuracy epsilon is (O) over tilde (1/epsilon), where each iteration operates on a single training example. In contrast, previous analyses of stochastic gradient descent methods for SVMs require Omega(1/epsilon(2)) iterations. As in previously devised SVM solvers, the number of iterations also scales linearly with 1/lambda, where. is the regularization parameter of SVM. For a linear kernel, the total run-time of our method is (O) over tilde (d/(lambda epsilon)), where d is a bound on the number of non-zero features in each example. Since the run-time does not depend directly on the size of the training set, the resulting algorithm is especially suited for learning from large datasets. Our approach also extends to non-linear kernels while working solely on the primal objective function, though in this case the runtime does depend linearly on the training set size. Our algorithm is particularly well suited for large text classification problems, where we demonstrate an order-of-magnitude speedup over previous SVM learning methods.
引用
收藏
页码:3 / 30
页数:28
相关论文
共 37 条
  • [1] Natural gradient works efficiently in learning
    Amari, S
    [J]. NEURAL COMPUTATION, 1998, 10 (02) : 251 - 276
  • [2] [Anonymous], 2008, ICML
  • [3] [Anonymous], 2008, P 25 INT C MACHINE L, DOI [10.1145/1390156.1390273, DOI 10.1145/1390156.1390273]
  • [4] [Anonymous], 2008, Advances in Neural Information Processing Systems, DOI DOI 10.7751/mitpress/8996.003.0015
  • [5] [Anonymous], 1990, SUPPORT VECTOR LEARN
  • [6] [Anonymous], 1997, Parallel Optimization: Theory, Algorithms, and Applications
  • [7] [Anonymous], 1998, Online Algorithms and Stochastic Approximations
  • [8] Bordes A, 2005, J MACH LEARN RES, V6, P1579
  • [9] Bottou L., 2004, ADV NEURAL INFORM PR, V16
  • [10] Bottou L., 2002, HDB BRAIN THEORY NEU, VSecond