MilliSort and MilliQuery: Large-Scale Data-Intensive Computing in Milliseconds

被引：0

作者：

Li, Yilong ^{[1
]}

Park, Seo Jin ^{[2
]}

Ousterhout, John ^{[1
]}

机构：

[1] Stanford Univ, Stanford, CA 94305 USA

[2] MIT CSAIL, Cambridge, MA USA

来源：

PROCEEDINGS OF THE 18TH USENIX SYMPOSIUM ON NETWORKED SYSTEM DESIGN AND IMPLEMENTATION | 2021年

关键词：

D O I：

暂无

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Today's datacenter applications couple scale and time: applications that harness large numbers of servers also execute for long periods of time (seconds or more). This paper explores the possibility of flash bursts: applications that use a large number of servers but for very short time intervals (as little as one millisecond). In order to learn more about the feasibility of flash bursts, we developed two new benchmarks, MilliSort and MilliQuery. MilliSort is a sorting application and MilliQuery implements three SQL queries. The goal for both applications was to process as many records as possible in one millisecond, given unlimited resources in a datacenter. The short time scale required a new distributed sorting algorithm for MilliSort that uses a hierarchical form of partitioning. Both applications depended on fast group communication primitives such as shuffle and all-gather. Our implementation of MilliSort can sort 0.84 million items in one millisecond using 120 servers on an HPC cluster; MilliQuery can process .03-48 million items in one millisecond using 60-280 servers, depending on the query. The number of items that each application can process grows quadratically with the time budget. The primary obstacle to scalability is per-message costs, which appear in the form of inefficient shuffles and coordination overhead.

引用

页码：593 / 612

页数：20

共 50 条

[31] Research on the architecture of data-intensive computing platform
Hou, Ke
Zhang, Jing
Fang, Xing
Journal of Software Engineering, 2015, 9 (03): : 686 - 701
[32] Data-Intensive Scalable Computing for Scientific Applications
Bryant, Randal E.
COMPUTING IN SCIENCE & ENGINEERING, 2011, 13 (06) : 25 - 33
[33] A Data-Intensive Numerical Modeling Method for Large-Scale Rock Strata and Its Application in Mining Subsidence Prediction
Gong, Ya-Qiang
Guo, Guang-Li
Wang, Li-Ping
Li, Huai-Zhan
Zhang, Guang-Xue
Fang, Zhen
ROCK MECHANICS AND ROCK ENGINEERING, 2022, 55 (03) : 1687 - 1703
[34] Data-Intensive Computing Modules for Teaching Parallel and Distributed Computing
Gowanlock, Michael
Gallet, Benoit
2021 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW), 2021, : 350 - 357
[35] A Data-Intensive Numerical Modeling Method for Large-Scale Rock Strata and Its Application in Mining Subsidence Prediction
Ya-Qiang Gong
Guang-Li Guo
Li-Ping Wang
Huai-Zhan Li
Guang-Xue Zhang
Zhen Fang
Rock Mechanics and Rock Engineering, 2022, 55 : 1687 - 1703
[36] A New Data Classification Algorithm for Data-Intensive Computing Environments
Deng, Qizhi
Zhang, Longbo
Qian, Xin
Chen, Yali
Wang, Fengying
PROCEEDINGS OF THE 2ND INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION APPLICATIONS (ICCIA 2012), 2012, : 1351 - 1354
[37] Improvement Of Data Throughput In Data-Intensive Cloud Computing Applications
Ibrahim, Ibrahim Adel
Bassiouni, Mostafa
2019 IEEE FIFTH INTERNATIONAL CONFERENCE ON BIG DATA COMPUTING SERVICE AND APPLICATIONS (IEEE BIGDATASERVICE 2019), 2019, : 49 - 54
[38] Data Allocation with Neural Similarity Estimation for Data-Intensive Computing
Vamosi, Ralf
Schikuta, Erich
COMPUTATIONAL SCIENCE - ICCS 2022, PT III, 2022, 13352 : 534 - 546
[39] In-Memory Data Rearrangement for Irregular, Data-Intensive Computing
Lloyd, Scott
Gokhale, Maya
COMPUTER, 2015, 48 (08) : 18 - 25
[40] An Improved Bayesian Inference Method for Data-Intensive Computing
Ma, Feng
Liu, Weiyi
COMPUTATIONAL INTELLIGENCE AND INTELLIGENT SYSTEMS, 2012, 316 : 134 - 144

← 1 2 3 4 5 →