Improved MPC Algorithms for Edit Distance and Ulam Distance

被引:1
作者
Boroujeni, Mahdi [1 ]
Ghodsi, Mohammad [1 ,2 ]
Seddighin, Saeed [3 ]
机构
[1] Sharif Univ Technol, Dept Comp Engn, Tehran 7941776655, Iran
[2] Inst Res Fundamental Sci, Sch Comp Sci, Tehran 1953833511, Iran
[3] Toyota Technol Inst Chicago, Chicago, IL 60637 USA
关键词
MapReduce; parallel algorithms; approximation algorithms; ulam distance; edit distance;
D O I
10.1109/TPDS.2021.3076534
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Edit distance is one of the most fundamental problems in combinatorial optimization to measure the similarity between strings. Ulam distance is a special case of edit distance where no character is allowed to appear more than once in a string. Recent developments have been very fruitful for obtaining fast and parallel algorithms for both edit distance and Ulam distance. In this work, we present an almost optimal MPC (massively parallel computation) algorithm for Ulam distance and improve MPC algorithms for edit distance. Our algorithm for Ulam distance is almost optimal in the sense that (1) the approximation factor of our algorithm is 1 + epsilon, (2) the round complexity of our algorithm is constant, (3) the total memory of our algorithm is almost linear ((O) over tilde (epsilon)(n)), and (4) the overall running time of our algorithm is almost linear which is the best known for Ulam distance. We also improve the work of Hajiaghayi et al. for edit distance in terms of total memory. The best previously known MPC algorithm for edit distance requires (O) over tilde (n(2x))machines when the memory of each machine is bounded by (O) over tilde (n(1-x)). In this work, we improve the number of machines to (O) over tilde (n((9/5x)) while keeping the memory limit intact. Moreover, the round complexity of our algorithm is constant and the total running time of our algorithm is truly subquadratic. However, our improvement comes at the expense of a constant factor in the approximation guarantee of the algorithm. This improvement is inspired by the recent techniques of Boroujeni et al. and Chakraborty et al. for obtaining truly subquadratic time algorithms for edit distance.
引用
收藏
页码:2764 / 2776
页数:13
相关论文
共 31 条
[1]   Longest increasing subsequences: From patience sorting to the Baik-Deift-Johansson theorem [J].
Aldous, D ;
Diaconis, P .
BULLETIN OF THE AMERICAN MATHEMATICAL SOCIETY, 1999, 36 (04) :413-432
[2]   Edit Distance in Near-Linear Time: it's a Constant Factor [J].
Andoni, Alexandr ;
Nosatzki, Negev Shekel .
2020 IEEE 61ST ANNUAL SYMPOSIUM ON FOUNDATIONS OF COMPUTER SCIENCE (FOCS 2020), 2020, :990-1001
[3]   Parallel Algorithms for Geometric Graph Problems [J].
Andoni, Alexandr ;
Nikolov, Aleksandar ;
Onak, Krzysztof ;
Yaroslavtsev, Grigory .
STOC'14: PROCEEDINGS OF THE 46TH ANNUAL 2014 ACM SYMPOSIUM ON THEORY OF COMPUTING, 2014, :574-583
[4]   Polylogarithmic Approximation for Edit Distance and the Asymmetric Query Complexity [J].
Andoni, Alexandr ;
Krauthgamer, Robert ;
Onak, Krzysztof .
2010 IEEE 51ST ANNUAL SYMPOSIUM ON FOUNDATIONS OF COMPUTER SCIENCE, 2010, :377-386
[5]  
Andoni A, 2009, ACM S THEORY COMPUT, P199
[6]  
[Anonymous], 2004, OSDI 04 P 6 C S OPEA
[7]  
[Anonymous], 2013, P ACM SIGACT SIGMOD, DOI [10.1145/2463664.2465224, 10]
[8]  
[Anonymous], 2009, Hadoop: The Definitive Guide
[9]   Edit Distance Cannot Be Computed in Strongly Subquadratic Time (unless SETH is false) [J].
Backurs, Arturs ;
Indyk, Piotr .
STOC'15: PROCEEDINGS OF THE 2015 ACM SYMPOSIUM ON THEORY OF COMPUTING, 2015, :51-58
[10]   Approximating edit distance efficiently [J].
Bar-Yossef, Z ;
Jayram, TS ;
Krauthgamer, R ;
Kumar, R .
45TH ANNUAL IEEE SYMPOSIUM ON FOUNDATIONS OF COMPUTER SCIENCE, PROCEEDINGS, 2004, :550-559