Core-sets: An updated survey

被引:25
作者
Feldman, Dan [1 ]
机构
[1] Univ Haifa, Comp Sci Dept, IL-3498838 Haifa, Israel
关键词
big data; cloud; coresets; distributed computation; streaming; EXTENT MEASURES; APPROXIMATION; ALGORITHMS;
D O I
10.1002/widm.1335
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In optimization or machine learning problems we are given a set of items, usually points in some metric space, and the goal is to minimize or maximize an objective function over some space of candidate solutions. For example, in clustering problems, the input is a set of points in some metric space, and a common goal is to compute a set of centers in some other space (points, lines) that will minimize the sum of distances to these points. In database queries, we may need to compute such a some for a specific query set of k centers. However, traditional algorithms cannot handle modern systems that require parallel real-time computations of infinite distributed streams from sensors such as GPS, audio or video that arrive to a cloud, or networks of weaker devices such as smartphones or robots. Core-set is a "small data" summarization of the input "big data," where every possible query has approximately the same answer on both data sets. Generic techniques enable efficient coreset maintenance of streaming, distributed and dynamic data. Traditional algorithms can then be applied on these coresets to maintain the approximated optimal solutions. The challenge is to design coresets with provable tradeoff between their size and approximation error. This survey summarizes such constructions in a retrospective way, that aims to unified and simplify the state-of-the-art. This article is categorized under Algorithmic Development > Structure Discovery Fundamental Concepts of Data and Knowledge > Big Data Mining Technologies > Machine Learning Algorithmic Development > Scalable Statistical Methods
引用
收藏
页数:18
相关论文
共 102 条
[1]  
Ackermann MR, 2012, J EXP ALGORITHM, V17, P2, DOI [10.1145/2133803.2184450, DOI 10.1145/2133803.2184450]
[2]  
Agarwal P. K., 2004, P 23 ACM SIGMOD SIGA, P155
[3]  
Agarwal Pankaj K., 2005, Combinatorial and Computational Geometry, V1, P1
[4]   Approximating extent measures of points [J].
Agarwal, PK ;
Har-Peled, S ;
Varadarajan, KR .
JOURNAL OF THE ACM, 2004, 51 (04) :606-635
[5]   Approximation algorithms for projective clustering [J].
Agarwal, PK ;
Procopiuc, CM .
JOURNAL OF ALGORITHMS-COGNITION INFORMATICS AND LOGIC, 2003, 46 (02) :115-139
[6]  
Agarwal PK, 2002, LECT NOTES COMPUT SC, V2461, P54
[7]  
Agarwal PK, 2001, SIAM PROC S, P148
[8]  
Agarwal PK, 2000, PROCEEDINGS OF THE ELEVENTH ANNUAL ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS, P538
[9]  
Aggarwal A, 2009, LECT NOTES COMPUT SC, V5687, P15, DOI 10.1007/978-3-642-03685-9_2
[10]  
Akavia A., 2019, SECURE DATA RETRIEVA