MilliSort and MilliQuery: Large-Scale Data-Intensive Computing in Milliseconds

被引：0

作者：

Li, Yilong ^{[1
]}

Park, Seo Jin ^{[2
]}

Ousterhout, John ^{[1
]}

机构：

[1] Stanford Univ, Stanford, CA 94305 USA

[2] MIT CSAIL, Cambridge, MA USA

来源：

PROCEEDINGS OF THE 18TH USENIX SYMPOSIUM ON NETWORKED SYSTEM DESIGN AND IMPLEMENTATION | 2021年

关键词：

D O I：

暂无

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Today's datacenter applications couple scale and time: applications that harness large numbers of servers also execute for long periods of time (seconds or more). This paper explores the possibility of flash bursts: applications that use a large number of servers but for very short time intervals (as little as one millisecond). In order to learn more about the feasibility of flash bursts, we developed two new benchmarks, MilliSort and MilliQuery. MilliSort is a sorting application and MilliQuery implements three SQL queries. The goal for both applications was to process as many records as possible in one millisecond, given unlimited resources in a datacenter. The short time scale required a new distributed sorting algorithm for MilliSort that uses a hierarchical form of partitioning. Both applications depended on fast group communication primitives such as shuffle and all-gather. Our implementation of MilliSort can sort 0.84 million items in one millisecond using 120 servers on an HPC cluster; MilliQuery can process .03-48 million items in one millisecond using 60-280 servers, depending on the query. The number of items that each application can process grows quadratically with the time budget. The primary obstacle to scalability is per-message costs, which appear in the form of inefficient shuffles and coordination overhead.

引用

页码：593 / 612

页数：20

共 50 条

[41] Innovative methods and algorithms for advanced data-intensive computing
Cuzzocrea, A. (cuzzocrea@si.deis.unical.it), 1600, Elsevier B.V. (37):
[42] Enabling Trusted Data-Intensive Execution in Cloud Computing
Zhang, Ning
Lou, Wenjing
Jiang, Xuxian
Hou, Y. Thomas
2014 IEEE CONFERENCE ON COMMUNICATIONS AND NETWORK SECURITY (CNS), 2014, : 355 - 363
[43] Data-intensive computing in the 21st century
Gorton, Ian
Greenfield, Paul
Szalay, Alex
Williams, Roy
COMPUTER, 2008, 41 (04) : 30 - 32
[44] Data-Intensive Computing in Smart Microgrids: Volume II
Herodotou, Herodotos
Aslam, Sheraz
ENERGIES, 2022, 15 (16)
[45] Hyracks: A Flexible and Extensible Foundation for Data-Intensive Computing
Borkar, Vinayak
Carey, Michael
Grover, Raman
Onose, Nicola
Vernica, Rares
IEEE 27TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2011), 2011, : 1151 - 1162
[46] A Data-Intensive Workflow Scheduling Algorithm for Grid Computing
Xu, Meng
Cui, Lizhen
Wang, Haiyang
Bi, Yanbing
Bian, Ji
FOURTH CHINAGRID ANNUAL CONFERENCE, PROCEEDINGS, 2009, : 110 - 115
[47] A new volunteer computing model for data-intensive applications
Alonso-Monsalve, Saul
Garcia-Carballeira, Felix
Calderon, Alejandro
CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2017, 29 (24)
[48] Special Issue on Data-Intensive Scalable Computing Systems
Roth, Philip C.
Canon, R. Shane
PARALLEL COMPUTING, 2017, 61 : 1 - 2
[49] Rethinking Memory System Design for Data-Intensive Computing
Mutlu, Onur
Proceedings International Conference on Embedded Computer Systems - Architectures, Modeling and Simulation (SAMOS XV), 2015, : I - I
[50] Dynamic function placement for data-intensive cluster computing
Amiri, K
Petrou, D
Ganger, GR
Gibson, GA
USENIX ASSOCIATION PROCEEDINGS OF THE 2000 USENIX ANNUAL TECHNICAL CONFERENCE, 2000, : 307 - 322

← 1 2 3 4 5 →