Architectural support for task scheduling: hardware scheduling for dataflow on NUMA systems

被引:0
|
作者
Behram Khan
Daniel Goodman
Salman Khan
Will Toms
Paolo Faraboschi
Mikel Luján
Ian Watson
机构
[1] BT Research,
[2] Solarflare Communications,undefined
[3] The University of Manchester,undefined
[4] HP Labs,undefined
来源
The Journal of Supercomputing | 2015年 / 71卷
关键词
Scheduling; Hardware scheduling; Task-based application; Dataflow;
D O I
暂无
中图分类号
学科分类号
摘要
To harness the compute resource of many-core system with tens to hundreds of cores, applications have to expose parallelism to the hardware. Researchers are aggressively looking for program execution models that make it easier to expose parallelism and use the available resources. One common approach is to decompose a program into parallel ‘tasks’ and allow an underlying system layer to schedule these tasks to different threads. Software-only schedulers can implement various scheduling policies and algorithms that match the characteristics of different applications and programming models. Unfortunately with large-scale multi-core systems, software schedulers suffer significant overheads as they synchronize and communicate task information over deep cache hierarchies. To reduce these overheads, hardware-only schedulers like Carbon have been proposed to enable task queuing and scheduling to be done in hardware. This paper presents a hardware scheduling approach where the structure provided to programs by task-based programming models can be incorporated into the scheduler, making it aware of a task’s data requirements. This prior knowledge of a task’s data requirements allows for better task placement by the scheduler which result in a reduction in overall cache misses and memory traffic, improving the program’s performance and power utilization. Simulations of this technique for a range of synthetic benchmarks and components of real applications have shown a reduction in the number of cache misses by up to 72 and 95 % for the L1 and L2 caches, respectively, and up to 30 % improvement in overall execution time against FIFO scheduling. This results not only in faster execution and in less data transfer with reductions of up to 50 %, allowing for less load on the interconnect, but also in lower power consumption.
引用
收藏
页码:2309 / 2338
页数:29
相关论文
共 50 条
  • [1] Architectural support for task scheduling: hardware scheduling for dataflow on NUMA systems
    Khan, Behram
    Goodman, Daniel
    Khan, Salman
    Toms, Will
    Faraboschi, Paolo
    Lujan, Mikel
    Watson, Ian
    JOURNAL OF SUPERCOMPUTING, 2015, 71 (06): : 2309 - 2338
  • [2] OpenMP task scheduling strategies for multicore NUMA systems
    Olivier, Stephen L.
    Porterfield, Allan K.
    Wheeler, Kyle B.
    Spiegel, Michael
    Prins, Jan F.
    INTERNATIONAL JOURNAL OF HIGH PERFORMANCE COMPUTING APPLICATIONS, 2012, 26 (02): : 110 - 124
  • [3] Task Scheduling in Sucuri Dataflow Library
    Silva, Rafael J. N.
    Goldstein, Brunno
    Santiago, Leandro
    Sena, Alexandre C.
    Marzulo, Leandro A. J.
    Alves, Tiago A. O.
    Franca, Felipe M. G.
    2016 28TH IEEE INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE AND HIGH PERFORMANCE COMPUTING WORKSHOPS (SBAC-PADW), 2016, : 37 - 42
  • [4] Architectural Support for Task Dependence Management with Flexible Software Scheduling
    Castillo, Emilio
    Alvarez, Lluc
    Moreto, Miquel
    Casas, Marc
    Vallejo, Enrique
    Luis Bosque, Jose
    Beivide, Ramon
    Valero, Mateo
    2018 24TH IEEE INTERNATIONAL SYMPOSIUM ON HIGH PERFORMANCE COMPUTER ARCHITECTURE (HPCA), 2018, : 283 - 295
  • [5] Streaming Task Graph Scheduling for Dataflow Architectures
    De Matteis, Tiziano
    Gianinazzi, Lukas
    Licht, Johannes de Fine
    Hoefler, Torsten
    PROCEEDINGS OF THE 32ND INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE PARALLEL AND DISTRIBUTED COMPUTING, HPDC 2023, 2023, : 225 - 237
  • [6] Task scheduling in distributed decision support systems
    Trakhtengerts, EA
    AUTOMATION AND REMOTE CONTROL, 1996, 57 (08) : 1207 - 1215
  • [7] Task Scheduling in Distributed Decision Support Systems
    Autom Remote Control, 2 (1207):
  • [8] A new scheduling strategy for NUMA multiprocessor systems
    Lai, GJ
    Chen, C
    1996 INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS, PROCEEDINGS, 1996, : 222 - 229
  • [9] vProbe: Scheduling Virtual Machines on NUMA Systems
    Wu, Song
    Sun, Huahua
    Zhou, Like
    Gan, Qingtian
    Jin, Hai
    2016 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER), 2016, : 70 - 79
  • [10] Evaluation of OpenMP Task Scheduling Algorithms for Large NUMA Architectures
    Clet-Ortega, Jerome
    Carribault, Patrick
    Perache, Marc
    EURO-PAR 2014 PARALLEL PROCESSING, 2014, 8632 : 596 - 607