MilliSort and MilliQuery: Large-Scale Data-Intensive Computing in Milliseconds

被引:0
|
作者
Li, Yilong [1 ]
Park, Seo Jin [2 ]
Ousterhout, John [1 ]
机构
[1] Stanford Univ, Stanford, CA 94305 USA
[2] MIT CSAIL, Cambridge, MA USA
来源
PROCEEDINGS OF THE 18TH USENIX SYMPOSIUM ON NETWORKED SYSTEM DESIGN AND IMPLEMENTATION | 2021年
关键词
D O I
暂无
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Today's datacenter applications couple scale and time: applications that harness large numbers of servers also execute for long periods of time (seconds or more). This paper explores the possibility of flash bursts: applications that use a large number of servers but for very short time intervals (as little as one millisecond). In order to learn more about the feasibility of flash bursts, we developed two new benchmarks, MilliSort and MilliQuery. MilliSort is a sorting application and MilliQuery implements three SQL queries. The goal for both applications was to process as many records as possible in one millisecond, given unlimited resources in a datacenter. The short time scale required a new distributed sorting algorithm for MilliSort that uses a hierarchical form of partitioning. Both applications depended on fast group communication primitives such as shuffle and all-gather. Our implementation of MilliSort can sort 0.84 million items in one millisecond using 120 servers on an HPC cluster; MilliQuery can process .03-48 million items in one millisecond using 60-280 servers, depending on the query. The number of items that each application can process grows quadratically with the time budget. The primary obstacle to scalability is per-message costs, which appear in the form of inefficient shuffles and coordination overhead.
引用
收藏
页码:593 / 612
页数:20
相关论文
共 50 条
  • [21] Data classification algorithm for data-intensive computing environments
    Tiedong Chen
    Shifeng Liu
    Daqing Gong
    Honghu Gao
    EURASIP Journal on Wireless Communications and Networking, 2017
  • [22] Research on the architecture of data-intensive computing platform
    Hou, Ke
    Zhang, Jing
    Fang, Xing
    Journal of Software Engineering, 2015, 9 (03): : 686 - 701
  • [23] Data-Intensive Scalable Computing for Scientific Applications
    Bryant, Randal E.
    COMPUTING IN SCIENCE & ENGINEERING, 2011, 13 (06) : 25 - 33
  • [24] The Benefits of Service Choreography for Data-intensive Computing
    Barker, Adam
    Besana, Paolo
    Robertson, David
    Weissman, Jon B.
    CLADE09: 7TH INTERNATIONAL WORKSHOP ON CHALLENGES OF LARGE APPLICATIONS IN DISTRIBUTED ENVIRONMENTS, 2009, : 1 - 10
  • [25] A Framework for Data-Intensive Computing with Cloud Bursting
    Bicer, Tekin
    Chiu, David
    Agrawal, Gagan
    2011 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER), 2011, : 169 - 177
  • [26] Challenges and Opportunities for Data-Intensive Computing in the Cloud
    Jung, Eun-Sung
    Kettimuthu, Rajkumar
    COMPUTER, 2014, 47 (12) : 82 - 85
  • [27] Automated Debugging in Data-Intensive Scalable Computing
    Gulzar, Muhammad Ali
    Interlandi, Matteo
    Han, Xueyuan
    Li, Mingda
    Condie, Tyson
    Kim, Miryung
    PROCEEDINGS OF THE 2017 SYMPOSIUM ON CLOUD COMPUTING (SOCC '17), 2017, : 520 - 534
  • [28] Coordinating Green Clouds as Data-Intensive Computing
    Biran, Yahav
    Collins, George
    Liberatore, Joseph
    PROCEEDINGS 2016 EIGHTH ANNUAL IEEE GREEN TECHNOLOGIES CONFERENCE (GREENTECH 2016), 2016, : 130 - 135
  • [29] Parallel Framework for Data-Intensive Computing with XSEDE
    Subramanian, Ranjini
    Zhang, Hui
    PEARC '19: PROCEEDINGS OF THE PRACTICE AND EXPERIENCE IN ADVANCED RESEARCH COMPUTING ON RISE OF THE MACHINES (LEARNING), 2019,
  • [30] Real-Time Data-Intensive Computing
    Parkinson, Dilworth Y.
    Beattie, Keith
    Chen, Xian
    Correa, Joaquin
    Dart, Eli
    Daurer, Benedikt J.
    Deslippe, Jack R.
    Hexemer, Alexander
    Krishnan, Harinarayan
    MacDowell, Alastair A.
    Maia, Filipe R. N. C.
    Marchesini, Stefano
    Padmore, Howard A.
    Patton, Simon J.
    Perciano, Talita
    Sethian, James A.
    Shapiro, David
    Stromsness, Rune
    Tamura, Nobumichi
    Tierney, Brian L.
    Tull, Craig E.
    Ushizima, Daniela
    PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON SYNCHROTRON RADIATION INSTRUMENTATION (SRI2015), 2016, 1741