RackSched: A Microsecond-Scale Scheduler for Rack-Scale Computers

被引:0
|
作者
Zhu, Hang [1 ]
Kaffes, Kostis [2 ]
Chen, Zixu [1 ]
Liu, Zhenming [3 ]
Kozyrakis, Christos [2 ]
Stoica, Ion [4 ]
Jin, Xin [1 ]
机构
[1] Johns Hopkins Univ, Baltimore, MD 21218 USA
[2] Stanford Univ, Stanford, CA 94305 USA
[3] Coll William & Mary, Williamsburg, VA 23187 USA
[4] Univ Calif Berkeley, Berkeley, CA USA
来源
PROCEEDINGS OF THE 14TH USENIX SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION (OSDI '20) | 2020年
关键词
TAIL;
D O I
暂无
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Low-latency online services have strict Service Level Objectives (SLOs) that require datacenter systems to support high throughput at microsecond-scale tail latency. Dataplane operating systems have been designed to scale up multi-core servers with minimal overhead for such SLOs. However, as application demands continue to increase, scaling up is not enough, and serving larger demands requires these systems to scale out to multiple servers in a rack. We present RackSched, the first rack-level microsecond-scale scheduler that provides the abstraction of a rack-scale computer (i.e., a huge server with hundreds to thousands of cores) to an external service with network-system co-design. The core of RackSched is a two-layer scheduling framework that integrates inter-server scheduling in the top-of-rack (ToR) switch with intra-server scheduling in each server. We use a combination of analytical results and simulations to show that it provides near-optimal performance as centralized scheduling policies, and is robust for both low-dispersion and high-dispersion workloads. We design a custom switch data plane for the inter-server scheduler, which realizes power-of-k-choices, ensures request affinity, and tracks server loads accurately and efficiently. We implement a RackSched prototype on a cluster of commodity servers connected by a Barefoot Tofino switch. End-to-end experiments on a twelve-server testbed show that RackSched improves the throughput by up to 1.44 x , and scales out the throughput near linearly, while maintaining the same tail latency as one server until the system is saturated.
引用
收藏
页码:1225 / 1240
页数:16
相关论文
共 50 条
  • [1] XFabric: A Reconfigurable In-Rack Network for Rack-Scale Computers
    Legtchenko, Sergey
    Chen, Nicholas
    Cletheroe, Daniel
    Rowstron, Antony
    Williams, Hugh
    Zhao, Xiaohan
    13TH USENIX SYMPOSIUM ON NETWORKED SYSTEMS DESIGN AND IMPLEMENTATION (NSDI '16), 2016, : 15 - 29
  • [2] Microsecond-Scale Core Reallocation
    Queue, 2023, 21 (02): : 3 - 4
  • [3] Exploration of FPGA-Based Packet Switches for Rack-Scale Computers on a Board
    Han, Jong Hun
    Manihatty-Bojan, Neelakandan
    Moore, Andrew W.
    2017 IEEE 25TH ANNUAL INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES (FCCM 2017), 2017, : 133 - 133
  • [4] R2C2: A Network Stack for Rack-scale Computers
    Costa, Paolo
    Ballani, Hitesh
    Razavi, Kaveh
    Kash, Ian
    ACM SIGCOMM COMPUTER COMMUNICATION REVIEW, 2015, 45 (04) : 551 - 564
  • [5] R2C2: A Network Stack for Rack-scale Computers
    Costa, Paolo
    Ballani, Hitesh
    Razavi, Kaveh
    Kash, Ian
    SIGCOMM'15: PROCEEDINGS OF THE 2015 ACM CONFERENCE ON SPECIAL INTEREST GROUP ON DATA COMMUNICATION, 2015, : 551 - 564
  • [6] High speed adaptive rack-scale fabrics
    Sella, Omer S.
    Moore, Andrew W.
    Zilberman, Noa
    SIGCOMM'18: PROCEEDINGS OF THE ACM SIGCOMM 2018 CONFERENCE: POSTERS AND DEMOS, 2018, : 33 - 35
  • [7] Efficient Scheduling Policies for Microsecond-Scale Tasks
    McClure, Sarah
    Ousterhout, Amy
    Shenker, Scott
    Ratnasamy, Sylvia
    PROCEEDINGS OF THE 19TH USENIX SYMPOSIUM ON NETWORKED SYSTEMS DESIGN AND IMPLEMENTATION (NSDI '22), 2022, : 1 - 18
  • [8] Decibel: Isolation and Sharing in Disaggregated Rack-Scale Storage
    Nanavati, Mihir
    Wires, Jake
    Warfield, Andrew
    PROCEEDINGS OF NSDI '17: 14TH USENIX SYMPOSIUM ON NETWORKED SYSTEMS DESIGN AND IMPLEMENTATION, 2017, : 17 - 33
  • [9] uBFT: Microsecond-Scale BFT using Disaggregated Memory
    Aguilera, Marcos K.
    Ben-David, Naama
    Guerraoui, Rachid
    Murat, Antoine
    Xygkis, Athanasios
    Zablotchi, Igor
    PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON ARCHITECTURAL SUPPORT FOR PROGRAMMING LANGUAGES AND OPERATING SYSTEMS, VOL 2, ASPLOS 2023, 2023, : 862 - 877
  • [10] Ultrafast cooling reveals microsecond-scale biomolecular dynamics
    Polinkovsky, Mark E.
    Gambin, Yann
    Banerjee, Priya R.
    Erickstad, Michael J.
    Groisman, Alex
    Deniz, Ashok A.
    NATURE COMMUNICATIONS, 2014, 5