Architectural support for task scheduling: hardware scheduling for dataflow on NUMA systems

被引：0

作者：

Behram Khan

Daniel Goodman

Salman Khan

Will Toms

Paolo Faraboschi

Mikel Luján

Ian Watson

机构：

[1] BT Research,

[2] Solarflare Communications,undefined

[3] The University of Manchester,undefined

[4] HP Labs,undefined

来源：

The Journal of Supercomputing | 2015年 / 71卷

关键词：

Scheduling; Hardware scheduling; Task-based application; Dataflow;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

To harness the compute resource of many-core system with tens to hundreds of cores, applications have to expose parallelism to the hardware. Researchers are aggressively looking for program execution models that make it easier to expose parallelism and use the available resources. One common approach is to decompose a program into parallel ‘tasks’ and allow an underlying system layer to schedule these tasks to different threads. Software-only schedulers can implement various scheduling policies and algorithms that match the characteristics of different applications and programming models. Unfortunately with large-scale multi-core systems, software schedulers suffer significant overheads as they synchronize and communicate task information over deep cache hierarchies. To reduce these overheads, hardware-only schedulers like Carbon have been proposed to enable task queuing and scheduling to be done in hardware. This paper presents a hardware scheduling approach where the structure provided to programs by task-based programming models can be incorporated into the scheduler, making it aware of a task’s data requirements. This prior knowledge of a task’s data requirements allows for better task placement by the scheduler which result in a reduction in overall cache misses and memory traffic, improving the program’s performance and power utilization. Simulations of this technique for a range of synthetic benchmarks and components of real applications have shown a reduction in the number of cache misses by up to 72 and 95 % for the L1 and L2 caches, respectively, and up to 30 % improvement in overall execution time against FIFO scheduling. This results not only in faster execution and in less data transfer with reductions of up to 50 %, allowing for less load on the interconnect, but also in lower power consumption.

引用

页码：2309 / 2338

页数：29

共 50 条

[1] Architectural support for task scheduling: hardware scheduling for dataflow on NUMA systems
Khan, Behram
Goodman, Daniel
Khan, Salman
Toms, Will
Faraboschi, Paolo
Lujan, Mikel
Watson, Ian
JOURNAL OF SUPERCOMPUTING, 2015, 71 (06): : 2309 - 2338
[2] OpenMP task scheduling strategies for multicore NUMA systems
Olivier, Stephen L.
Porterfield, Allan K.
Wheeler, Kyle B.
Spiegel, Michael
Prins, Jan F.
INTERNATIONAL JOURNAL OF HIGH PERFORMANCE COMPUTING APPLICATIONS, 2012, 26 (02): : 110 - 124
[3] Task Scheduling in Sucuri Dataflow Library
Silva, Rafael J. N.
Goldstein, Brunno
Santiago, Leandro
Sena, Alexandre C.
Marzulo, Leandro A. J.
Alves, Tiago A. O.
Franca, Felipe M. G.
2016 28TH IEEE INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE AND HIGH PERFORMANCE COMPUTING WORKSHOPS (SBAC-PADW), 2016, : 37 - 42
[4] Architectural Support for Task Dependence Management with Flexible Software Scheduling
Castillo, Emilio
Alvarez, Lluc
Moreto, Miquel
Casas, Marc
Vallejo, Enrique
Luis Bosque, Jose
Beivide, Ramon
Valero, Mateo
2018 24TH IEEE INTERNATIONAL SYMPOSIUM ON HIGH PERFORMANCE COMPUTER ARCHITECTURE (HPCA), 2018, : 283 - 295
[5] Streaming Task Graph Scheduling for Dataflow Architectures
De Matteis, Tiziano
Gianinazzi, Lukas
Licht, Johannes de Fine
Hoefler, Torsten
PROCEEDINGS OF THE 32ND INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE PARALLEL AND DISTRIBUTED COMPUTING, HPDC 2023, 2023, : 225 - 237
[6] Task scheduling in distributed decision support systems
Trakhtengerts, EA
AUTOMATION AND REMOTE CONTROL, 1996, 57 (08) : 1207 - 1215
[7] Task Scheduling in Distributed Decision Support Systems
Autom Remote Control, 2 (1207):
[8] A new scheduling strategy for NUMA multiprocessor systems
Lai, GJ
Chen, C
1996 INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS, PROCEEDINGS, 1996, : 222 - 229
[9] vProbe: Scheduling Virtual Machines on NUMA Systems
Wu, Song
Sun, Huahua
Zhou, Like
Gan, Qingtian
Jin, Hai
2016 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER), 2016, : 70 - 79
[10] Evaluation of OpenMP Task Scheduling Algorithms for Large NUMA Architectures
Clet-Ortega, Jerome
Carribault, Patrick
Perache, Marc
EURO-PAR 2014 PARALLEL PROCESSING, 2014, 8632 : 596 - 607

← 1 2 3 4 5 →