BPPGD: Budgeted Parallel Primal grAdient desCent Kernel SVM on Spark

被引:7
|
作者
Sai, Jinchen [1 ]
Wang, Bai [1 ]
Wu, Bin [1 ]
机构
[1] Beijing Univ Posts & Telecommun, Beijing Key Lab Intelligent Telecommun Software &, Beijing, Peoples R China
来源
2016 IEEE FIRST INTERNATIONAL CONFERENCE ON DATA SCIENCE IN CYBERSPACE (DSC 2016) | 2016年
关键词
stochastic gradient descent; support vector machines; Spark; kernel method; large-scale learning; packing strategy; budget maintenance; distributed hash table;
D O I
10.1109/DSC.2016.36
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Stochastic Gradient Descent (SGD) is the best known method to optimize the primal objective for linear support vector machines (SVM) to dispose large data. However, when equipped with kernel functions, SGD performance is vulnerable that causes unbounded linear growth in model size and update time with data size. This paper describes a budgeted parallel pack gradient descent algorithm (BPPGD) that can improve SVM optimize problem with Gaussian Radial Basis Function (RBF) to large-scale data and run efficiently on Apache Spark with high degree of parallelization. Apache Spark is a fast and general engine for large-scale data processing which has advantage on big data parallel computing and dealing with iterative algorithms. BPPGD algorithm has constant time complexity per update. It uses a new distributed hash table-IndexedRDD to increase the parallel degree, packing strategy to improve SGD performance with reducing the number of communication and removal budget maintenance method to keep the number of support vectors (SVs). The experiment results show that BPPGD achieves higher accuracy than P-packSVM (Zhu et al., 2009) and BSGD (Zhuang et al., 2012) algorithms on Spark environment, and it takes shorter time.
引用
收藏
页码:74 / 79
页数:6
相关论文
共 50 条
  • [1] P-packSVM: Parallel Primal grAdient desCent Kernel SVM
    Zhu, Zeyuan Allen
    Chen, Weizhu
    Wang, Gang
    Zhu, Chenguang
    Chen, Zheng
    2009 9TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, 2009, : 677 - +
  • [2] Budgeted Mini-Batch Parallel Gradient Descent for Support Vector Machines on Spark
    Tao, Hang
    Wu, Bin
    Lin, Xiuqin
    2014 20TH IEEE INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS (ICPADS), 2014, : 945 - 950
  • [3] Nonparametric Budgeted Stochastic Gradient Descent
    Trung Le
    Vu Nguyen
    Tu Dinh Nguyen
    Dinh Phung
    ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 51, 2016, 51 : 564 - 572
  • [4] Breaking the Curse of Kernelization: Budgeted Stochastic Gradient Descent for Large-Scale SVM Training
    Wang, Zhuang
    Crammer, Koby
    Vucetic, Slobodan
    JOURNAL OF MACHINE LEARNING RESEARCH, 2012, 13 : 3103 - 3131
  • [5] Breaking the curse of kernelization: Budgeted stochastic gradient descent for large-scale SVM training
    Wang, Z. (zhuang.wang@siemens.com), 1600, Microtome Publishing (13):
  • [6] A primal perspective for indefinite kernel SVM problem
    Xue, Hui
    Xu, Haiming
    Chen, Xiaohong
    Wang, Yunyun
    FRONTIERS OF COMPUTER SCIENCE, 2020, 14 (02) : 349 - 363
  • [7] A primal perspective for indefinite kernel SVM problem
    Hui Xue
    Haiming Xu
    Xiaohong Chen
    Yunyun Wang
    Frontiers of Computer Science, 2020, 14 : 349 - 363
  • [8] Performance Optimization on Model Synchronization in Parallel Stochastic Gradient Descent Based SVM
    Abeykoon, Vibhatha
    Fox, Geoffrey
    Kim, Minje
    2019 19TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND GRID COMPUTING (CCGRID), 2019, : 508 - 517
  • [9] Large-scale support vector regression with budgeted stochastic gradient descent
    Zongxia Xie
    Yingda Li
    International Journal of Machine Learning and Cybernetics, 2019, 10 : 1529 - 1541
  • [10] Large-scale support vector regression with budgeted stochastic gradient descent
    Xie, Zongxia
    Li, Yingda
    INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2019, 10 (06) : 1529 - 1541