Optimising parallel programs for hardware implementation

被引:0
|
作者
Coutinho, JGF [1 ]
Luk, W [1 ]
Weinhardt, M [1 ]
机构
[1] Univ London Imperial Coll Sci Technol & Med, Dept Comp, London SW7 2BZ, England
来源
RECONFIGURABLE TECHNOLOGY: FPGAS AND RECONFIGURABLE PROCESSORS FOR COMPUTING AND COMMUNICATIONS IV | 2002年 / 4867卷
关键词
Program transformations; sequentialisation; pipeline vectorization; loop pipelining;
D O I
10.1117/12.455467
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
This paper describes an approach for optimizing hardware designs produced from software languages extended with constructs for parallel execution and hardware processing, such as the Handel-C language. Our aim is to optimize these programs by applying transformations that include the appropriate amount of parallelism, in order to obtain the best trade-offs in space and in time. These transformations can be applied automatically at compile time, enabling the programmer to adapt parallel programs rapidly to a specific hardware platform. Our transformational approach, which involves design sequentialisation and parallelisation, contains two novel features. First, we develop an algorithm for sequentialising parallel programs. This algorithm relaxes the scheduling of the original design, giving a scheduler the freedom to arrange it to achieve better results in speed, in size, or in both. Second, we combine this sequentialisation algorithm with pipeline vectorization, a technique known to reduce the execution delay of loops by pipelining the loop body. We adapt several transformation techniques used in vectorizing and parallelizing software compilers, such as loop unrolling and loop tiling, to widen the applicability of our method. Results show that our approach often works well: for instance a manually pipelined convolution design, for implementation in a Xilinx XC4000 device produced from a Handel-C description, is speeded up by over 2 times by our prototype compiler.
引用
收藏
页码:60 / 70
页数:11
相关论文
共 50 条
  • [31] Hardware Implementation of Skeletonization Algorithm for Parallel Asynchronous Image Processing
    Lopich, Alexey
    Dudek, Piotr
    JOURNAL OF SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY, 2009, 56 (01): : 91 - 103
  • [32] Hardware Implementation of Skeletonization Algorithm for Parallel Asynchronous Image Processing
    Alexey Lopich
    Piotr Dudek
    Journal of Signal Processing Systems, 2009, 56 : 91 - 103
  • [33] A parallel implementation of LMS adaptive filter in hardware for landmine detection
    Desai, T
    Hintz, KJ
    DETECTION AND REMEDIATION TECHNOLOGIES FOR MINES AND MINELIKE TARGETS IX, PTS 1 AND 2, 2004, 5415 : 973 - 983
  • [34] Design Methods for Parallel Hardware Implementation of Multimedia Iterative Algorithms
    Rana, Vincenzo
    Beretta, Ivan
    Atienza, David
    Nacci, Alessandro A.
    Santambrogio, Marco D.
    Sciuto, Donatella
    IEEE DESIGN & TEST, 2013, 30 (04) : 71 - 80
  • [35] An efficient hardware implementation of parallel EBCOT algorithm for JPEG 2000
    Taoufik Saidani
    Mohamed Atri
    Lazhar Khriji
    Rached Tourki
    Journal of Real-Time Image Processing, 2016, 11 : 63 - 74
  • [36] Hirschberg's algorithm on a GCA and its parallel hardware implementation
    Jendrsczok, Johannes
    Hoffmann, Rolf
    Keller, Joerg
    EURO-PAR 2007 PARALLEL PROCESSING, PROCEEDINGS, 2007, 4641 : 815 - +
  • [37] Parallel Implementation of MAFFT on CUDA-Enabled Graphics Hardware
    Zhu, Xiangyuan
    Li, Kenli
    Salah, Ahmad
    Shi, Lin
    Li, Keqin
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2015, 12 (01) : 205 - 218
  • [38] Parallel Hardware Architecture and FPGA Implementation of a Differential Evolution Algorithm
    Jewajinda, Yutana
    TENCON 2014 - 2014 IEEE REGION 10 CONFERENCE, 2014,
  • [39] HARDWARE IMPLEMENTATION OF PARTITIONED-PARALLEL ALGORITHMS IN LINEAR PREDICTION
    CARAYANNIS, G
    KOUKOUTSIS, E
    HALKIAS, CC
    SIGNAL PROCESSING, 1991, 24 (03) : 253 - 269
  • [40] An efficient crossover architecture for hardware parallel implementation of genetic algorithm
    Faraji, Rasoul
    Naji, Hamid Reza
    NEUROCOMPUTING, 2014, 128 : 316 - 327