MilliSort and MilliQuery: Large-Scale Data-Intensive Computing in Milliseconds

被引:0
作者
Li, Yilong [1 ]
Park, Seo Jin [2 ]
Ousterhout, John [1 ]
机构
[1] Stanford Univ, Stanford, CA 94305 USA
[2] MIT CSAIL, Cambridge, MA USA
来源
PROCEEDINGS OF THE 18TH USENIX SYMPOSIUM ON NETWORKED SYSTEM DESIGN AND IMPLEMENTATION | 2021年
关键词
D O I
暂无
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Today's datacenter applications couple scale and time: applications that harness large numbers of servers also execute for long periods of time (seconds or more). This paper explores the possibility of flash bursts: applications that use a large number of servers but for very short time intervals (as little as one millisecond). In order to learn more about the feasibility of flash bursts, we developed two new benchmarks, MilliSort and MilliQuery. MilliSort is a sorting application and MilliQuery implements three SQL queries. The goal for both applications was to process as many records as possible in one millisecond, given unlimited resources in a datacenter. The short time scale required a new distributed sorting algorithm for MilliSort that uses a hierarchical form of partitioning. Both applications depended on fast group communication primitives such as shuffle and all-gather. Our implementation of MilliSort can sort 0.84 million items in one millisecond using 120 servers on an HPC cluster; MilliQuery can process .03-48 million items in one millisecond using 60-280 servers, depending on the query. The number of items that each application can process grows quadratically with the time budget. The primary obstacle to scalability is per-message costs, which appear in the form of inefficient shuffles and coordination overhead.
引用
收藏
页码:593 / 612
页数:20
相关论文
共 50 条
  • [41] Innovative methods and algorithms for advanced data-intensive computing
    Cuzzocrea, A. (cuzzocrea@si.deis.unical.it), 1600, Elsevier B.V. (37):
  • [42] Enabling Trusted Data-Intensive Execution in Cloud Computing
    Zhang, Ning
    Lou, Wenjing
    Jiang, Xuxian
    Hou, Y. Thomas
    2014 IEEE CONFERENCE ON COMMUNICATIONS AND NETWORK SECURITY (CNS), 2014, : 355 - 363
  • [43] Data-intensive computing in the 21st century
    Gorton, Ian
    Greenfield, Paul
    Szalay, Alex
    Williams, Roy
    COMPUTER, 2008, 41 (04) : 30 - 32
  • [44] Data-Intensive Computing in Smart Microgrids: Volume II
    Herodotou, Herodotos
    Aslam, Sheraz
    ENERGIES, 2022, 15 (16)
  • [45] Hyracks: A Flexible and Extensible Foundation for Data-Intensive Computing
    Borkar, Vinayak
    Carey, Michael
    Grover, Raman
    Onose, Nicola
    Vernica, Rares
    IEEE 27TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2011), 2011, : 1151 - 1162
  • [46] A Data-Intensive Workflow Scheduling Algorithm for Grid Computing
    Xu, Meng
    Cui, Lizhen
    Wang, Haiyang
    Bi, Yanbing
    Bian, Ji
    FOURTH CHINAGRID ANNUAL CONFERENCE, PROCEEDINGS, 2009, : 110 - 115
  • [47] A new volunteer computing model for data-intensive applications
    Alonso-Monsalve, Saul
    Garcia-Carballeira, Felix
    Calderon, Alejandro
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2017, 29 (24)
  • [48] Special Issue on Data-Intensive Scalable Computing Systems
    Roth, Philip C.
    Canon, R. Shane
    PARALLEL COMPUTING, 2017, 61 : 1 - 2
  • [49] Rethinking Memory System Design for Data-Intensive Computing
    Mutlu, Onur
    Proceedings International Conference on Embedded Computer Systems - Architectures, Modeling and Simulation (SAMOS XV), 2015, : I - I
  • [50] Dynamic function placement for data-intensive cluster computing
    Amiri, K
    Petrou, D
    Ganger, GR
    Gibson, GA
    USENIX ASSOCIATION PROCEEDINGS OF THE 2000 USENIX ANNUAL TECHNICAL CONFERENCE, 2000, : 307 - 322