Adaptive Runtime-Assisted Block Prefetching on Chip-Multiprocessors

被引：0

作者：

Garcia, Victor ^{[1
,2
]}

Rico, Alejandro ^{[2
]}

Villavieja, Carlos ^{[3
]}

Carpenter, Paul ^{[2
]}

Navarro, Nacho ^{[1
,2
]}

Ramirez, Alex ^{[4
]}

机构：

[1] Univ Politecn Cataluna, Barcelona, Spain

[2] Barcelona Supercomp Ctr, Barcelona, Spain

[3] Google Inc, New York, NY USA

[4] NVIDIA Corp, Santa Clara, CA USA

来源：

INTERNATIONAL JOURNAL OF PARALLEL PROGRAMMING | 2017年 / 45卷 / 03期

关键词：

Cache memories; Prefetch; Task based programming models;

D O I：

10.1007/s10766-016-0431-8

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

Memory stalls are a significant source of performance degradation in modern processors. Data prefetching is a widely adopted and well studied technique used to alleviate this problem. Prefetching can be performed by the hardware, or be initiated and controlled by software. Among software controlled prefetching we find a wide variety of schemes, including runtime-directed prefetching and more specifically runtime-directed block prefetching. This paper proposes a hybrid prefetching mechanism that integrates a software driven block prefetcher with existing hardware prefetching techniques. Our runtime-assisted software prefetcher brings large blocks of data on-chip with the support of a low cost hardware engine, and synergizes with existing hardware prefetchers that manage locality at a finer granularity. The runtime system that drives the prefetch engine dynamically selects which cache to prefetch to. Our evaluation on a set of scientific benchmarks obtains a maximum speed up of 32 and 10 % on average compared to a baseline with hardware prefetching only. As a result, we also achieve a reduction of up to 18 and 3 % on average in energy-to-solution.

引用

页码：530 / 550

页数：21

共 35 条

[1]

ARM, 2008, CORT A9 TECHN REF MA

[2] StarPU: a unified platform for task scheduling on heterogeneous multicore architectures [J].

Augonnet, Cedric ;

Thibault, Samuel ;

Namyst, Raymond ;

Wacrenier, Pierre-Andre .

CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2011, 23 (02) :187-198

[3]

Byna Surendra, 2008, 2008 9th International Symposium on Parallel Architectures, Algorithms and Networks (ISPAN '08), P19, DOI 10.1109/I-SPAN.2008.24

[4] Parallel programmability and the Chapel language [J].

Chamberlain, B. L. ;

Callahan, D. ;

Zima, H. P. .

INTERNATIONAL JOURNAL OF HIGH PERFORMANCE COMPUTING APPLICATIONS, 2007, 21 (03) :291-312

[5]

Charles Philippe., 2005, SIGPLAN Not, V40, P519

[6] EFFECTIVE HARDWARE-BASED DATA PREFETCHING FOR HIGH-PERFORMANCE PROCESSORS [J].

CHEN, TF ;

BAER, JL .

IEEE TRANSACTIONS ON COMPUTERS, 1995, 44 (05) :609-623

[7]

Chung I.-H., 2006, Proceedings of the 15th IEEE International Symposium on High Performance Distributed Computing (IEEE Cat. No.06TH8878), P45

[8]

Dahlgren F., 1995, Proceedings. First IEEE Symposium on High-Performance Computer Architecture, P68, DOI 10.1109/HPCA.1995.386554

[9]

Dahlgren Fredrik., 1993, ICPP 1993, V1, P56, DOI DOI 10.1109/ICPP.1993.92

[10] OnipSs: A PROPOSAL FOR PROGRAMMING HETEROGENEOUS MULTI-CORE ARCHITECTURES [J].

Duran, Alejandro ;

Ayguade, Eduard ;

Badia, Rosa M. ;

Labahta, Jesus ;

Martinell, Luis ;

Martorell, Xavier ;

Planas, Judit .

PARALLEL PROCESSING LETTERS, 2011, 21 (02) :173-193

← 1 2 3 4 →