Optimising parallel programs for hardware implementation

被引：0

作者：

Coutinho, JGF ^{[1
]}

Luk, W ^{[1
]}

Weinhardt, M ^{[1
]}

机构：

[1] Univ London Imperial Coll Sci Technol & Med, Dept Comp, London SW7 2BZ, England

来源：

RECONFIGURABLE TECHNOLOGY: FPGAS AND RECONFIGURABLE PROCESSORS FOR COMPUTING AND COMMUNICATIONS IV | 2002年 / 4867卷

关键词：

Program transformations; sequentialisation; pipeline vectorization; loop pipelining;

D O I：

10.1117/12.455467

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

This paper describes an approach for optimizing hardware designs produced from software languages extended with constructs for parallel execution and hardware processing, such as the Handel-C language. Our aim is to optimize these programs by applying transformations that include the appropriate amount of parallelism, in order to obtain the best trade-offs in space and in time. These transformations can be applied automatically at compile time, enabling the programmer to adapt parallel programs rapidly to a specific hardware platform. Our transformational approach, which involves design sequentialisation and parallelisation, contains two novel features. First, we develop an algorithm for sequentialising parallel programs. This algorithm relaxes the scheduling of the original design, giving a scheduler the freedom to arrange it to achieve better results in speed, in size, or in both. Second, we combine this sequentialisation algorithm with pipeline vectorization, a technique known to reduce the execution delay of loops by pipelining the loop body. We adapt several transformation techniques used in vectorizing and parallelizing software compilers, such as loop unrolling and loop tiling, to widen the applicability of our method. Results show that our approach often works well: for instance a manually pipelined convolution design, for implementation in a Xilinx XC4000 device produced from a Handel-C description, is speeded up by over 2 times by our prototype compiler.

引用

页码：60 / 70

页数：11

共 50 条

[1] Competitive implementation of parallel programs
Deng, X
Koutsoupias, E
MacKenzie, P
ALGORITHMICA, 1999, 23 (01) : 14 - 30
[2] Competitive Implementation of Parallel Programs
X. Deng
E. Koutsoupias
P. MacKenzie
Algorithmica, 1999, 23 : 14 - 30
[3] Formal approach to hardware synthesis of parallel programs
Liu, D.M.
Song, G.X.
Huadong Ligong Daxue Xuebao /Journal of East China University of Science and Technology, 2001, 27 (05):
[4] On hardware synthesis and implementation of PLC programs in FPGAs
Milik, Adam
MICROPROCESSORS AND MICROSYSTEMS, 2016, 44 : 2 - 16
[5] HARDWARE IMPLEMENTATION OF A PARALLEL NOISE CLEARING ALGORITHM
ATIQUZZAMAN, M
MICROPROCESSING AND MICROPROGRAMMING, 1989, 26 (02): : 119 - 128
[6] Parallel Hardware Implementation of Walsh Hadamard Transform
Mazumder, Pulak
Chandra, Soumyadeep
Rana, Sekhar
Mukhopadhyay, Mainak
Naskar, Mrinal Kanti
JOURNAL OF SCIENTIFIC & INDUSTRIAL RESEARCH, 2022, 81 (07): : 748 - 753
[7] Hardware Aspects of Parallel Neural Network Implementation
Kouretas, I
Paliouras, V
2021 10TH INTERNATIONAL CONFERENCE ON MODERN CIRCUITS AND SYSTEMS TECHNOLOGIES (MOCAST), 2021,
[8] Parallel algorithm for hardware implementation of inverse halftoning
Siddiqi, UF
Sait, SM
Farooqui, AA
2005 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), VOLS 1-6, CONFERENCE PROCEEDINGS, 2005, : 2377 - 2380
[9] Optimising data-parallel programs using the BSP cost model
Skillicorn, DB
Danelutto, M
Pelagatti, S
Zavanella, A
EURO-PAR '98 PARALLEL PROCESSING, 1998, 1470 : 698 - 703
[10] IMPLEMENTING REACTIVE PROGRAMS ON CIRCUITS A HARDWARE IMPLEMENTATION OF LUSTER
ROCHETEAU, F
HALBWACHS, N
LECTURE NOTES IN COMPUTER SCIENCE, 1992, 600 : 195 - 208

← 1 2 3 4 5 →