BPPGD: Budgeted Parallel Primal grAdient desCent Kernel SVM on Spark

被引：7

作者：

Sai, Jinchen ^{[1
]}

Wang, Bai ^{[1
]}

Wu, Bin ^{[1
]}

机构：

[1] Beijing Univ Posts & Telecommun, Beijing Key Lab Intelligent Telecommun Software &, Beijing, Peoples R China

来源：

2016 IEEE FIRST INTERNATIONAL CONFERENCE ON DATA SCIENCE IN CYBERSPACE (DSC 2016) | 2016年

关键词：

stochastic gradient descent; support vector machines; Spark; kernel method; large-scale learning; packing strategy; budget maintenance; distributed hash table;

D O I：

10.1109/DSC.2016.36

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Stochastic Gradient Descent (SGD) is the best known method to optimize the primal objective for linear support vector machines (SVM) to dispose large data. However, when equipped with kernel functions, SGD performance is vulnerable that causes unbounded linear growth in model size and update time with data size. This paper describes a budgeted parallel pack gradient descent algorithm (BPPGD) that can improve SVM optimize problem with Gaussian Radial Basis Function (RBF) to large-scale data and run efficiently on Apache Spark with high degree of parallelization. Apache Spark is a fast and general engine for large-scale data processing which has advantage on big data parallel computing and dealing with iterative algorithms. BPPGD algorithm has constant time complexity per update. It uses a new distributed hash table-IndexedRDD to increase the parallel degree, packing strategy to improve SGD performance with reducing the number of communication and removal budget maintenance method to keep the number of support vectors (SVs). The experiment results show that BPPGD achieves higher accuracy than P-packSVM (Zhu et al., 2009) and BSGD (Zhuang et al., 2012) algorithms on Spark environment, and it takes shorter time.

引用

页码：74 / 79

页数：6

共 50 条

[1] P-packSVM: Parallel Primal grAdient desCent Kernel SVM
Zhu, Zeyuan Allen
Chen, Weizhu
Wang, Gang
Zhu, Chenguang
Chen, Zheng
2009 9TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, 2009, : 677 - +
[2] Budgeted Mini-Batch Parallel Gradient Descent for Support Vector Machines on Spark
Tao, Hang
Wu, Bin
Lin, Xiuqin
2014 20TH IEEE INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS (ICPADS), 2014, : 945 - 950
[3] Nonparametric Budgeted Stochastic Gradient Descent
Trung Le
Vu Nguyen
Tu Dinh Nguyen
Dinh Phung
ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 51, 2016, 51 : 564 - 572
[4] Breaking the Curse of Kernelization: Budgeted Stochastic Gradient Descent for Large-Scale SVM Training
Wang, Zhuang
Crammer, Koby
Vucetic, Slobodan
JOURNAL OF MACHINE LEARNING RESEARCH, 2012, 13 : 3103 - 3131
[5] Breaking the curse of kernelization: Budgeted stochastic gradient descent for large-scale SVM training
Wang, Z. (zhuang.wang@siemens.com), 1600, Microtome Publishing (13):
[6] A primal perspective for indefinite kernel SVM problem
Xue, Hui
Xu, Haiming
Chen, Xiaohong
Wang, Yunyun
FRONTIERS OF COMPUTER SCIENCE, 2020, 14 (02) : 349 - 363
[7] A primal perspective for indefinite kernel SVM problem
Hui Xue
Haiming Xu
Xiaohong Chen
Yunyun Wang
Frontiers of Computer Science, 2020, 14 : 349 - 363
[8] Performance Optimization on Model Synchronization in Parallel Stochastic Gradient Descent Based SVM
Abeykoon, Vibhatha
Fox, Geoffrey
Kim, Minje
2019 19TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND GRID COMPUTING (CCGRID), 2019, : 508 - 517
[9] Large-scale support vector regression with budgeted stochastic gradient descent
Zongxia Xie
Yingda Li
International Journal of Machine Learning and Cybernetics, 2019, 10 : 1529 - 1541
[10] Large-scale support vector regression with budgeted stochastic gradient descent
Xie, Zongxia
Li, Yingda
INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2019, 10 (06) : 1529 - 1541

← 1 2 3 4 5 →