Sparrow: Distributed, Low Latency Scheduling

被引：323

作者：

Ousterhout, Kay ^{[1
]}

Wendell, Patrick ^{[1
]}

Zaharia, Matei ^{[1
]}

Stoica, Ion ^{[1
]}

机构：

[1] Univ Calif Berkeley, Berkeley, CA 94720 USA

来源：

SOSP'13: PROCEEDINGS OF THE TWENTY-FOURTH ACM SYMPOSIUM ON OPERATING SYSTEMS PRINCIPLES | 2013年

关键词：

D O I：

10.1145/2517349.2522716

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

Large-scale data analytics frameworks are shifting towards shorter task durations and larger degrees of parallelism to provide low latency. Scheduling highly parallel jobs that complete in hundreds of milliseconds poses a major challenge for task schedulers, which will need to schedulemillions of tasks per second on appropriate machines while offering millisecond-level latency and high availability. We demonstrate that a decentralized, randomized sampling approach provides near-optimal performance while avoiding the throughput and availability limitations of a centralized design. We implement and deploy our scheduler, Sparrow, on a 110-machine cluster and demonstrate that Sparrow performs within 12% of an ideal scheduler.

引用

页码：69 / 84

页数：16

共 24 条

[1] Ananthanarayanan G., 2012, HOTCLOUD
[2] Ananthanarayanan G., 2010, P OSDI
[3] [Anonymous], 2009, P SOSP
[4] [Anonymous], 2012, P 9 USENIX C NETWORK
[5] [Anonymous], P SOCC
[6] [Anonymous], 2009, Hadoop: The Definitive Guide
[7] [Anonymous], 2010, P EUROSYS
[8] An update on the scalability limits of the Condor batch system
Bradley, D.
St Clair, T.
Farrellee, M.
Guo, Z.
Livny, M.
Sfiligoi, I.
Tannenbaum, T.
[J]. INTERNATIONAL CONFERENCE ON COMPUTING IN HIGH ENERGY AND NUCLEAR PHYSICS (CHEP 2010), 2011, 331
[9] The Tail at Scale
Dean, Jeffrey
Barroso, Luiz Andre
[J]. COMMUNICATIONS OF THE ACM, 2013, 56 (02) : 74 - 80
[10] Demers A., 1989, P SIGCOMM

← 1 2 3 →