Partition Scheduling on Heterogeneous Multicore Processors for Multi-dimensional Loops Applications

被引：0

作者：

Yan Wang

Kenli Li

Keqin Li

机构：

[1] Guangzhou University,School of Computer Science and Educational Software

[2] Hunan University,College of Information Science and Engineering

来源：

International Journal of Parallel Programming | 2017年 / 45卷

关键词：

Heterogeneous multicore processor; Memory latency; Multi-dimensional loops; Scheduling;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

This paper addresses the scheduling problem for multi-dimensional loops applications on heterogeneous multicore processors. In the multi-dimensional loops scheduling problem, a significant issue is how to hide memory latency to reduce the schedule length. With the increasing CPU speed, the gap between the processor and memory performance is an important bottleneck for modern high-performance computer systems. To solve the bottleneck problem, a variety of techniques have been studied to hide memory latency from intermediate fast memories (caches) to various prefetching and memory management techniques. Although there are a lot of algorithms in the literature to solve the scheduling with memory management problem for multiprocessor systems, they may not deliver good quality with high performance for heterogeneous multicore processors. In this paper, we first propose a scheduling algorithm Recom_Task_Assign to reduce the write activities to main memory. Then, in conjunction with the Recom_Task_Assign algorithm, we present a new partition scheduling algorithm called heterogeneous multiprocessor partition (HMP) based on the prefetching technique for heterogeneous multicore processors, which can hide memory latencies for applications with multi-dimensional loops. This technique takes advantage of memory access pattern information and fully considers the heterogeneity of processors to achieve high processor utilization. Our HMP algorithm selects the appropriate partition size and shape according to different processors, which increases processor utilization and reduces memory latency. Experiments on DSP benchmarks show that our algorithm can efficiently reduce memory latency and enhance parallelism compared with existing methods.

引用

页码：827 / 852

页数：25

共 68 条

[1]

Belviranli ME(2013)A dynamic self-scheduling scheme for heterogeneous multiprocessor architectures ACM Trans. Archit. Code Optim. (TACO) 9 57-51

[2]

Bhuyan LN(2012)Dague: a generic distributed dag engine for high performance computing Parallel Comput. 38 37-623

[3]

Gupta R(1995)Effective hardware-based data prefetching for high-performance processors IEEE Trans. Comput. 44 609-130

[4]

Bosilca G(2013)On-line real-time task scheduling in heterogeneous multi-core system-on-a-chip IEEE Trans. Parallel Distrib. Syst. 24 118-746

[5]

Bouteiller A(1995)Sequential hardware prefetching in shared-memory multiprocessors IEEE Trans. Parallel Distrib. Syst. 6 733-409

[6]

Danalis A(2008)A high performance algorithm for static task scheduling in heterogeneous distributed computing systems J. Parallel Distrib. Comput. 68 399-2699

[7]

Herault T(2013)Energy-efficient multi-channel cooperative sensing scheduling with heterogeneous channel conditions for cognitive radio networks IEEE Trans. Veh. Technol. 62 2690-670

[8]

Lemarinier P(2005)A high performance, low complexity algorithm for compile-time task scheduling in heterogeneous systems Parallel Comput. 31 653-584

[9]

Dongarra J(2013)Rigorous rental memory management for embedded systems ACM Trans. Embed. Comput. Syst. (TECS) 12 43-106

[10]

Chen T-F(1994)The impact of parallel loop scheduling strategies on prefetching in a shared memory multiprocessor IEEE Trans. Parallel Distrib. Syst. 5 573-658

← 1 2 3 4 5 6 7 →