Algorithms for a Topology-aware Massively Parallel Computation Model

被引:0
作者
Hu, Xiao [1 ]
Koutris, Paraschos [2 ]
Blanas, Spyros [3 ]
机构
[1] Duke Univ, Durham, NC 27706 USA
[2] UW Madison, Madison, WI USA
[3] Ohio State Univ, Columbus, OH 43210 USA
来源
PODS '21: PROCEEDINGS OF THE 40TH SIGMOD-SIGACT-SIGAI SYMPOSIUM ON PRINCIPLES OF DATABASE SYSTEMS | 2021年
关键词
query processing; massively parallel computation; topology-aware;
D O I
10.1145/3452021.3458318
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Most of the prior work in massively parallel data processing assumes homogeneity, i.e., every computing unit has the same computational capability and can communicate with every other unit with the same latency and bandwidth. However, this strong assumption of a uniform topology rarely holds in practical settings, where computing units are connected through complex networks. To address this issue, Blanas et al. [9] recently proposed a topology-aware massively parallel computation model that integrates the network structure and heterogeneity in the modeling cost. The network is modeled as a directed graph, where each edge is associated with a cost function that depends on the data transferred between the two endpoints. The computation proceeds in synchronous rounds and the cost of each round is measured as the maximum cost over all the edges in the network. In this work, we take the first step into investigating three fundamental data processing tasks in this topology-aware parallel model: set intersection, cartesian product, and sorting. We focus on network topologies that are tree topologies, and present both lower bounds as well as (asymptotically) matching upper bounds. Instead of assuming a worst-case distribution as in previous results, the optimality of our algorithms is with respect to the initial data distribution among the network nodes. Apart from the theoretical optimality of our results, our protocols are simple, use a constant number of rounds, and we believe can be implemented in practical settings as well.
引用
收藏
页码:199 / 214
页数:16
相关论文
共 47 条
[1]   Optimizing Multiway Joins in a Map-Reduce Environment [J].
Afrati, Foto N. ;
Ullman, Jeffrey D. .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2011, 23 (09) :1282-1298
[2]   Parallel Algorithms for Constructing Range and Nearest-Neighbor Searching Data Structures [J].
Agarwal, Pankaj K. ;
Fox, Kyle ;
Munagala, Kamesh ;
Nath, Abhinandan .
PODS'16: PROCEEDINGS OF THE 35TH ACM SIGMOD-SIGACT-SIGAI SYMPOSIUM ON PRINCIPLES OF DATABASE SYSTEMS, 2016, :429-440
[3]   Parallel Algorithms for Geometric Graph Problems [J].
Andoni, Alexandr ;
Nikolov, Aleksandar ;
Onak, Krzysztof ;
Yaroslavtsev, Grigory .
STOC'14: PROCEEDINGS OF THE 46TH ANNUAL 2014 ACM SYMPOSIUM ON THEORY OF COMPUTING, 2014, :574-583
[4]  
[Anonymous], 2013, SIGMOD
[5]   Massively Parallel Algorithms for Finding Well-Connected Components in Sparse Graphs [J].
Assadi, Sepehr ;
Sun, Xiaorui ;
Weinstein, Omri .
PROCEEDINGS OF THE 2019 ACM SYMPOSIUM ON PRINCIPLES OF DISTRIBUTED COMPUTING (PODC '19), 2019, :461-470
[6]   A Logarithmic Approximation for Unsplittable Flow on Line Graphs [J].
Bansal, Nikhil ;
Friggstad, Zachary ;
Khandekar, Rohit ;
Salavatipour, Mohammad R. .
ACM TRANSACTIONS ON ALGORITHMS, 2014, 10 (01)
[7]   A New Framework for Distributed Submodular Maximization [J].
Barbosa, Rafael da Ponte ;
Ene, Alina ;
Nguyen, Huy L. ;
Ward, Justin .
2016 IEEE 57TH ANNUAL SYMPOSIUM ON FOUNDATIONS OF COMPUTER SCIENCE (FOCS), 2016, :645-654
[8]   Skew in Parallel Query Processing [J].
Beame, Paul ;
Koutris, Paraschos ;
Suciu, Dan .
PODS'14: PROCEEDINGS OF THE 33RD ACM SIGMOD-SIGACT-SIGART SYMPOSIUM ON PRINCIPLES OF DATABASE SYSTEMS, 2014, :212-223
[9]  
Beame Paul., 2013, PODS
[10]   Parallel Data Analysis Directly on Scientific File Formats [J].
Blanas, Spyros ;
Wu, Kesheng ;
Byna, Surendra ;
Dong, Bin ;
Shoshani, Arie .
SIGMOD'14: PROCEEDINGS OF THE 2014 ACM SIGMOD INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2014, :385-396