MilliSort and MilliQuery: Large-Scale Data-Intensive Computing in Milliseconds

被引:0
作者
Li, Yilong [1 ]
Park, Seo Jin [2 ]
Ousterhout, John [1 ]
机构
[1] Stanford Univ, Stanford, CA 94305 USA
[2] MIT CSAIL, Cambridge, MA USA
来源
PROCEEDINGS OF THE 18TH USENIX SYMPOSIUM ON NETWORKED SYSTEM DESIGN AND IMPLEMENTATION | 2021年
关键词
D O I
暂无
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Today's datacenter applications couple scale and time: applications that harness large numbers of servers also execute for long periods of time (seconds or more). This paper explores the possibility of flash bursts: applications that use a large number of servers but for very short time intervals (as little as one millisecond). In order to learn more about the feasibility of flash bursts, we developed two new benchmarks, MilliSort and MilliQuery. MilliSort is a sorting application and MilliQuery implements three SQL queries. The goal for both applications was to process as many records as possible in one millisecond, given unlimited resources in a datacenter. The short time scale required a new distributed sorting algorithm for MilliSort that uses a hierarchical form of partitioning. Both applications depended on fast group communication primitives such as shuffle and all-gather. Our implementation of MilliSort can sort 0.84 million items in one millisecond using 120 servers on an HPC cluster; MilliQuery can process .03-48 million items in one millisecond using 60-280 servers, depending on the query. The number of items that each application can process grows quadratically with the time budget. The primary obstacle to scalability is per-message costs, which appear in the form of inefficient shuffles and coordination overhead.
引用
收藏
页码:593 / 612
页数:20
相关论文
共 50 条
  • [31] Research on the architecture of data-intensive computing platform
    Hou, Ke
    Zhang, Jing
    Fang, Xing
    Journal of Software Engineering, 2015, 9 (03): : 686 - 701
  • [32] Data-Intensive Scalable Computing for Scientific Applications
    Bryant, Randal E.
    COMPUTING IN SCIENCE & ENGINEERING, 2011, 13 (06) : 25 - 33
  • [33] A Data-Intensive Numerical Modeling Method for Large-Scale Rock Strata and Its Application in Mining Subsidence Prediction
    Gong, Ya-Qiang
    Guo, Guang-Li
    Wang, Li-Ping
    Li, Huai-Zhan
    Zhang, Guang-Xue
    Fang, Zhen
    ROCK MECHANICS AND ROCK ENGINEERING, 2022, 55 (03) : 1687 - 1703
  • [34] Data-Intensive Computing Modules for Teaching Parallel and Distributed Computing
    Gowanlock, Michael
    Gallet, Benoit
    2021 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW), 2021, : 350 - 357
  • [35] A Data-Intensive Numerical Modeling Method for Large-Scale Rock Strata and Its Application in Mining Subsidence Prediction
    Ya-Qiang Gong
    Guang-Li Guo
    Li-Ping Wang
    Huai-Zhan Li
    Guang-Xue Zhang
    Zhen Fang
    Rock Mechanics and Rock Engineering, 2022, 55 : 1687 - 1703
  • [36] A New Data Classification Algorithm for Data-Intensive Computing Environments
    Deng, Qizhi
    Zhang, Longbo
    Qian, Xin
    Chen, Yali
    Wang, Fengying
    PROCEEDINGS OF THE 2ND INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION APPLICATIONS (ICCIA 2012), 2012, : 1351 - 1354
  • [37] Improvement Of Data Throughput In Data-Intensive Cloud Computing Applications
    Ibrahim, Ibrahim Adel
    Bassiouni, Mostafa
    2019 IEEE FIFTH INTERNATIONAL CONFERENCE ON BIG DATA COMPUTING SERVICE AND APPLICATIONS (IEEE BIGDATASERVICE 2019), 2019, : 49 - 54
  • [38] Data Allocation with Neural Similarity Estimation for Data-Intensive Computing
    Vamosi, Ralf
    Schikuta, Erich
    COMPUTATIONAL SCIENCE - ICCS 2022, PT III, 2022, 13352 : 534 - 546
  • [39] In-Memory Data Rearrangement for Irregular, Data-Intensive Computing
    Lloyd, Scott
    Gokhale, Maya
    COMPUTER, 2015, 48 (08) : 18 - 25
  • [40] An Improved Bayesian Inference Method for Data-Intensive Computing
    Ma, Feng
    Liu, Weiyi
    COMPUTATIONAL INTELLIGENCE AND INTELLIGENT SYSTEMS, 2012, 316 : 134 - 144