Optimising parallel programs for hardware implementation

被引:0
|
作者
Coutinho, JGF [1 ]
Luk, W [1 ]
Weinhardt, M [1 ]
机构
[1] Univ London Imperial Coll Sci Technol & Med, Dept Comp, London SW7 2BZ, England
关键词
Program transformations; sequentialisation; pipeline vectorization; loop pipelining;
D O I
10.1117/12.455467
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
This paper describes an approach for optimizing hardware designs produced from software languages extended with constructs for parallel execution and hardware processing, such as the Handel-C language. Our aim is to optimize these programs by applying transformations that include the appropriate amount of parallelism, in order to obtain the best trade-offs in space and in time. These transformations can be applied automatically at compile time, enabling the programmer to adapt parallel programs rapidly to a specific hardware platform. Our transformational approach, which involves design sequentialisation and parallelisation, contains two novel features. First, we develop an algorithm for sequentialising parallel programs. This algorithm relaxes the scheduling of the original design, giving a scheduler the freedom to arrange it to achieve better results in speed, in size, or in both. Second, we combine this sequentialisation algorithm with pipeline vectorization, a technique known to reduce the execution delay of loops by pipelining the loop body. We adapt several transformation techniques used in vectorizing and parallelizing software compilers, such as loop unrolling and loop tiling, to widen the applicability of our method. Results show that our approach often works well: for instance a manually pipelined convolution design, for implementation in a Xilinx XC4000 device produced from a Handel-C description, is speeded up by over 2 times by our prototype compiler.
引用
收藏
页码:60 / 70
页数:11
相关论文
共 50 条
  • [1] Competitive implementation of parallel programs
    Deng, X
    Koutsoupias, E
    MacKenzie, P
    ALGORITHMICA, 1999, 23 (01) : 14 - 30
  • [2] Competitive Implementation of Parallel Programs
    X. Deng
    E. Koutsoupias
    P. MacKenzie
    Algorithmica, 1999, 23 : 14 - 30
  • [3] Formal approach to hardware synthesis of parallel programs
    Liu, D.M.
    Song, G.X.
    Huadong Ligong Daxue Xuebao /Journal of East China University of Science and Technology, 2001, 27 (05):
  • [5] HARDWARE IMPLEMENTATION OF A PARALLEL NOISE CLEARING ALGORITHM
    ATIQUZZAMAN, M
    MICROPROCESSING AND MICROPROGRAMMING, 1989, 26 (02): : 119 - 128
  • [6] Parallel Hardware Implementation of Walsh Hadamard Transform
    Mazumder, Pulak
    Chandra, Soumyadeep
    Rana, Sekhar
    Mukhopadhyay, Mainak
    Naskar, Mrinal Kanti
    JOURNAL OF SCIENTIFIC & INDUSTRIAL RESEARCH, 2022, 81 (07): : 748 - 753
  • [7] Hardware Aspects of Parallel Neural Network Implementation
    Kouretas, I
    Paliouras, V
    2021 10TH INTERNATIONAL CONFERENCE ON MODERN CIRCUITS AND SYSTEMS TECHNOLOGIES (MOCAST), 2021,
  • [8] Parallel algorithm for hardware implementation of inverse halftoning
    Siddiqi, UF
    Sait, SM
    Farooqui, AA
    2005 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), VOLS 1-6, CONFERENCE PROCEEDINGS, 2005, : 2377 - 2380
  • [9] Optimising data-parallel programs using the BSP cost model
    Skillicorn, DB
    Danelutto, M
    Pelagatti, S
    Zavanella, A
    EURO-PAR '98 PARALLEL PROCESSING, 1998, 1470 : 698 - 703
  • [10] IMPLEMENTING REACTIVE PROGRAMS ON CIRCUITS A HARDWARE IMPLEMENTATION OF LUSTER
    ROCHETEAU, F
    HALBWACHS, N
    LECTURE NOTES IN COMPUTER SCIENCE, 1992, 600 : 195 - 208