On coresets for support vector machines

被引:14
作者
Tukan, Murad [1 ]
Baykal, Cenk [2 ]
Feldman, Dan [1 ]
Rus, Daniela [2 ]
机构
[1] Univ Haifa, Comp Sci Dept, Haifa, Israel
[2] MIT CSAIL, Cambridge, MA USA
基金
美国国家科学基金会;
关键词
Support vector machines; Coresets; Data reduction; Large-scale learning; Streaming;
D O I
10.1016/j.tcs.2021.09.008
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
We present an efficient coreset construction algorithm for large-scale Support Vector Machine (SVM) training in Big Data and streaming applications. A coreset is a small, representative subset of the original data points such that a model trained on the coreset is provably competitive with that trained on the original data set. Since the size of the coreset is generally much smaller than the original set, our preprocess-then-train scheme has potential to lead to significant speedups when training SVM models. We prove lower and upper bounds on the size of the coreset required to obtain small data summaries for the SVM problem. As a corollary, we show that our algorithm can be used to extend the applicability of any off-the-shelf SVM solver to streaming, distributed, and dynamic data settings. We evaluate the performance of our algorithm on real-world and synthetic data sets. Our experimental results reaffirm the favorable theoretical properties of our algorithm and demonstrate its practical effectiveness in accelerating SVM training. (C) 2021 Elsevier B.V. All rights reserved.
引用
收藏
页码:171 / 191
页数:21
相关论文
共 35 条
[11]   Dimensionality Reduction for k-Means Clustering and Low Rank Approximation [J].
Cohen, Michael B. ;
Elder, Sam ;
Musco, Cameron ;
Musco, Christopher ;
Persu, Madalina .
STOC'15: PROCEEDINGS OF THE 2015 ACM SYMPOSIUM ON THEORY OF COMPUTING, 2015, :163-172
[12]   Core-sets: An updated survey [J].
Feldman, Dan .
WILEY INTERDISCIPLINARY REVIEWS-DATA MINING AND KNOWLEDGE DISCOVERY, 2020, 10 (01)
[13]  
Feldman D, 2011, ACM S THEORY COMPUT, P569
[14]   Coresets for Polytope Distance [J].
Gaertner, Bernd ;
Jaggi, Martin .
PROCEEDINGS OF THE TWENTY-FIFTH ANNUAL SYMPOSIUM ON COMPUTATIONAL GEOMETRY (SCG'09), 2009, :33-42
[15]  
Har-Peled S, 2007, 20TH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, P836
[16]  
Har-Peled Sariel, 2004, P 36 ANN ACM S THEOR, P291, DOI DOI 10.1145/1007352.1007400
[17]  
Hazan E., 2011, Advances in Neural Information Processing Systems, P1233
[18]  
Henzinger M., ARXIV PREPRINT ARXIV
[19]  
Joachims T., 2006, P 12 ACM SIGKDD INT, P217
[20]   Deterministic Coresets for Stochastic Matrices with Applications to Scalable Sparse PageRank [J].
Lang, Harry ;
Baykal, Cenk ;
Abu Samra, Najib ;
Tannous, Tony ;
Feldman, Dan ;
Rus, Daniela .
THEORY AND APPLICATIONS OF MODELS OF COMPUTATION, TAMC 2019, 2019, 11436 :410-423