Custom parallel caching schemes for hardware-accelerated image compression

被引:6
|
作者
Ang, Su-Shin [1 ]
Constantinides, George A. [1 ]
Luk, Wayne [2 ]
Cheung, Peter Y. K. [1 ]
机构
[1] Univ London Imperial Coll Sci Technol & Med, Dept Elect & Elect Engn, London SW7 2AZ, England
[2] Univ London Imperial Coll Sci Technol & Med, Dept Comp, London SW7 2BZ, England
基金
英国工程与自然科学研究理事会;
关键词
Cache; Scratchpad; Data re-use; Arbitration; Hardware;
D O I
10.1007/s11554-008-0082-0
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In an effort to achieve lower bandwidth requirements, video compression algorithms have become increasingly complex. Consequently, the deployment of these algorithms on field programmable gate arrays (FPGAs) is becoming increasingly desirable, because of the computational parallelism on these platforms as well as the measure of flexibility afforded to designers. Typically, video data are stored in large and slow external memory arrays, but the impact of the memory access bottleneck may be reduced by buffering frequently used data in fast on-chip memories. The order of the memory accesses, resulting from many compression algorithms are dependent on the input data (Jain in Proceedings of the IEEE, pp. 349-389, 1981). These data-dependent memory accesses complicate the exploitation of data re-use, and subsequently reduce the extent to which an application may be accelerated. In this paper, we present a hybrid memory sub-system which is able to capture data re-use effectively in spite of data-dependent memory accesses. This memory sub-system is made up of a custom parallel cache and a scratchpad memory. Further, the framework is capable of exploiting 2D spatial locality, which is frequently exhibited in the access patterns of image processing applications. In a case study involving the quad-tree structured pulse code modulation (QSDPCM) application, the impact of data dependence on memory accesses is shown to be significant. In comparison with an implementation which only employs an SPM, performance improvements of up to 1.7x and 1.4x are observed through actual implementation on two modern FPGA platforms. These performance improvements are more pronounced for image sequences exhibiting greater inter-frame movements. In addition, reductions of on-chip memory resources by up to 3.2x are achievable using this framework. These results indicate that, on custom hardware platforms, there is substantial scope for improvement in the capture of data re-use when memory accesses are data dependent.
引用
收藏
页码:289 / 302
页数:14
相关论文
共 50 条
  • [1] Custom parallel caching schemes for hardware-accelerated image compression
    Su-Shin Ang
    George A. Constantinides
    Wayne Luk
    Peter Y. K. Cheung
    Journal of Real-Time Image Processing, 2008, 3
  • [2] A parallel hardware hypervisor for hardware-accelerated cloud computing
    Dogan, Atakan
    Ebcioglu, Kemal
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2022, 34 (09):
  • [3] Sabre: Hardware-Accelerated Snapshot Compression for Serverless MicroVMs
    Lazarev, Nikita
    Gohil, Varun
    Tsai, James
    Anderson, Andy
    Chitlur, Bhushan
    Zhang, Zhiru
    Delimitrou, Christina
    PROCEEDINGS OF THE 18TH USENIX SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION, OSDI 2024, 2024, : 1 - 18
  • [4] HARDWARE-ACCELERATED PARALLEL-SPLIT SHADOW MAPS
    Zhang, Fan
    Sun, Hanqiu
    Xu, Leilei
    Lee, Kitlun
    INTERNATIONAL JOURNAL OF IMAGE AND GRAPHICS, 2008, 8 (02) : 223 - 241
  • [5] A hardware-accelerated patch search engine for image completion
    Lin, Yi
    2006 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS, VOLS 1-6, PROCEEDINGS, 2006, : 3949 - 3954
  • [6] Pgx: Hardware-Accelerated Parallel Game Simulators for Reinforcement Learning
    Koyamada, Sotetsu
    Okano, Shinri
    Nishimori, Soichiro
    Murata, Yu
    Habara, Keigo
    Kita, Haruka
    Ishii, Shin
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [7] Implementation of Hardware-Accelerated Scalable Parallel Random Number Generators
    Lee, JunKyu
    Peterson, Gregory D.
    Harrison, Robert J.
    Hinde, Robert J.
    VLSI DESIGN, 2010, 2010
  • [8] Place and Route for Massively Parallel Hardware-Accelerated Functional Verification
    Moffitt, Michael D.
    Guenther, Gernot E.
    Pasnik, Kevin A.
    2013 IEEE/ACM INTERNATIONAL CONFERENCE ON COMPUTER-AIDED DESIGN (ICCAD), 2013, : 466 - 472
  • [9] Hardware-accelerated objective function evaluation for medical image registration
    Withayachumnankul, W
    Laksanapanai, B
    Pintavirooj, C
    TENCON 2004 - 2004 IEEE REGION 10 CONFERENCE, VOLS A-D, PROCEEDINGS: ANALOG AND DIGITAL TECHNIQUES IN ELECTRICAL ENGINEERING, 2004, : A419 - A422
  • [10] A Systematic Review of Hardware-Accelerated Compression of Remotely Sensed Hyperspectral Images
    Altamimi, Amal
    Ben Youssef, Belgacem
    SENSORS, 2022, 22 (01)