MilliSort and MilliQuery: Large-Scale Data-Intensive Computing in Milliseconds

被引：0

作者：

Li, Yilong ^{[1
]}

Park, Seo Jin ^{[2
]}

Ousterhout, John ^{[1
]}

机构：

[1] Stanford Univ, Stanford, CA 94305 USA

[2] MIT CSAIL, Cambridge, MA USA

来源：

PROCEEDINGS OF THE 18TH USENIX SYMPOSIUM ON NETWORKED SYSTEM DESIGN AND IMPLEMENTATION | 2021年

关键词：

D O I：

暂无

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Today's datacenter applications couple scale and time: applications that harness large numbers of servers also execute for long periods of time (seconds or more). This paper explores the possibility of flash bursts: applications that use a large number of servers but for very short time intervals (as little as one millisecond). In order to learn more about the feasibility of flash bursts, we developed two new benchmarks, MilliSort and MilliQuery. MilliSort is a sorting application and MilliQuery implements three SQL queries. The goal for both applications was to process as many records as possible in one millisecond, given unlimited resources in a datacenter. The short time scale required a new distributed sorting algorithm for MilliSort that uses a hierarchical form of partitioning. Both applications depended on fast group communication primitives such as shuffle and all-gather. Our implementation of MilliSort can sort 0.84 million items in one millisecond using 120 servers on an HPC cluster; MilliQuery can process .03-48 million items in one millisecond using 60-280 servers, depending on the query. The number of items that each application can process grows quadratically with the time budget. The primary obstacle to scalability is per-message costs, which appear in the form of inefficient shuffles and coordination overhead.

引用

页码：593 / 612

页数：20

共 50 条

[1] Distributed Data Provenance for Large-Scale Data-Intensive Computing
Zhao, Dongfang
Shou, Chen
Malik, Tanu
Raicu, Ioan
2013 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER), 2013,
[2] GridBatch: Cloud Computing for Large-Scale Data-Intensive Batch Applications
Liu, Huan
Orban, Dan
CCGRID 2008: EIGHTH IEEE INTERNATIONAL SYMPOSIUM ON CLUSTER COMPUTING AND THE GRID, VOLS 1 AND 2, PROCEEDINGS, 2008, : 295 - 305
[3] Passive Network Performance Estimation for Large-Scale, Data-Intensive Computing
Kim, Jinoh
Chandra, Abhishek
Weissman, Jon B.
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2011, 22 (08) : 1365 - 1373
[4] Study of performance evaluation for data-intensive large-scale systems
Liu, Ying
Song, Huaiming
Jiao, Limei
AMS 2007: FIRST ASIA INTERNATIONAL CONFERENCE ON MODELLING & SIMULATION ASIA MODELLING SYMPOSIUM, PROCEEDINGS, 2007, : 270 - +
[5] Software architecture for large-scale, distributed, data-intensive systems
Mattmann, CA
Crichton, DJ
Hughes, JS
Kelly, SC
Ramirez, PM
FOURTH WORKING IEEE/IFIP CONFERENCE ON SOFTWARE ARCHITECTURE (WICSA 2004), PROCEEDINGS, 2004, : 255 - 264
[6] FRAMEWORK FOR DATA-INTENSIVE APPLICATIONS OPTIMIZATIONIN LARGE-SCALE DISTRIBUTED SYSTEMS
Cirstoiu, Catalin
Tapus, Nicolae
UNIVERSITY POLITEHNICA OF BUCHAREST SCIENTIFIC BULLETIN SERIES C-ELECTRICAL ENGINEERING AND COMPUTER SCIENCE, 2009, 71 (03): : 89 - 104
[7] A Data-Intensive Workflow Scheduling Algorithm for Large-scale Cooperative Work Platform
Cui, Lizhen
Xu, Meng
Wang, Haiyang
2009 13TH INTERNATIONAL CONFERENCE ON COMPUTER SUPPORTED COOPERATIVE WORK IN DESIGN, 2009, : 486 - 491
[8] A WSRF based adaptive data transmission mechanism in large-scale data-intensive simulation grid
Wang, K
Du, ZH
Chai, YP
Li, SL
System Simulation and Scientific Computing, Vols 1 and 2, Proceedings, 2005, : 651 - 655
[9] Applications in Data-Intensive Computing
Shah, Anuj R.
Adkins, Joshua N.
Baxter, Douglas J.
Cannon, William R.
Chavarria-Miranda, Daniel G.
Choudhury, Sutanay
Gorton, Ian
Gracio, Deborah K.
Halter, Todd D.
Jaitly, Navdeep D.
Johnson, John R.
Kouzes, Richard T.
Macduff, Matthew C.
Marquez, Andres
Monroe, Matthew E.
Oehmen, Christopher S.
Pike, William A.
Scherrer, Chad
Villa, Oreste
Webb-Robertson, Bobbie-Jo
Whitney, Paul D.
Zuljevic, Nino
ADVANCES IN COMPUTERS, VOL 79, 2010, 79 : 1 - 70
[10] Next Generation HPC Clouds: A View for Large-Scale Scientific and Data-Intensive Applications
Petcu, Dana
Gonzalez-Velez, Horacio
Nicolae, Bogdan
Garcia-Gomez, Juan Miguel
Fuster-Garcia, Elies
Sheridan, Craig
EURO-PAR 2014: PARALLEL PROCESSING WORKSHOPS, PT II, 2014, 8806 : 26 - 37

← 1 2 3 4 5 →