Custom parallel caching schemes for hardware-accelerated image compression

被引:6
|
作者
Ang, Su-Shin [1 ]
Constantinides, George A. [1 ]
Luk, Wayne [2 ]
Cheung, Peter Y. K. [1 ]
机构
[1] Univ London Imperial Coll Sci Technol & Med, Dept Elect & Elect Engn, London SW7 2AZ, England
[2] Univ London Imperial Coll Sci Technol & Med, Dept Comp, London SW7 2BZ, England
基金
英国工程与自然科学研究理事会;
关键词
Cache; Scratchpad; Data re-use; Arbitration; Hardware;
D O I
10.1007/s11554-008-0082-0
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In an effort to achieve lower bandwidth requirements, video compression algorithms have become increasingly complex. Consequently, the deployment of these algorithms on field programmable gate arrays (FPGAs) is becoming increasingly desirable, because of the computational parallelism on these platforms as well as the measure of flexibility afforded to designers. Typically, video data are stored in large and slow external memory arrays, but the impact of the memory access bottleneck may be reduced by buffering frequently used data in fast on-chip memories. The order of the memory accesses, resulting from many compression algorithms are dependent on the input data (Jain in Proceedings of the IEEE, pp. 349-389, 1981). These data-dependent memory accesses complicate the exploitation of data re-use, and subsequently reduce the extent to which an application may be accelerated. In this paper, we present a hybrid memory sub-system which is able to capture data re-use effectively in spite of data-dependent memory accesses. This memory sub-system is made up of a custom parallel cache and a scratchpad memory. Further, the framework is capable of exploiting 2D spatial locality, which is frequently exhibited in the access patterns of image processing applications. In a case study involving the quad-tree structured pulse code modulation (QSDPCM) application, the impact of data dependence on memory accesses is shown to be significant. In comparison with an implementation which only employs an SPM, performance improvements of up to 1.7x and 1.4x are observed through actual implementation on two modern FPGA platforms. These performance improvements are more pronounced for image sequences exhibiting greater inter-frame movements. In addition, reductions of on-chip memory resources by up to 3.2x are achievable using this framework. These results indicate that, on custom hardware platforms, there is substantial scope for improvement in the capture of data re-use when memory accesses are data dependent.
引用
收藏
页码:289 / 302
页数:14
相关论文
共 50 条
  • [41] Hardware-accelerated dynamic light field rendering
    Goldlücke, B
    Magnor, M
    Wilburn, B
    VISION MODELING, AND VISUALIZATION 2002, PROCEEDINGS, 2002, : 455 - +
  • [42] A generic hardware-accelerated OFDM system simulator
    Veiverys, Antanas
    Goluguri, Vara Prasad
    Le Moullec, Yannick
    Rom, Christian
    Olsen, Ole
    Koch, Peter
    NORCHIP 2005, PROCEEDINGS, 2005, : 62 - 65
  • [43] PHAST: Hardware-accelerated shortest path trees
    Delling, Daniel
    Goldberg, Andrew V.
    Nowatzyk, Andreas
    Werneck, Renato F.
    JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2013, 73 (07) : 940 - 952
  • [44] Hardware-Accelerated Index Construction for Semantic Web
    Blochwitz, Christopher
    Wolff, Julian
    Berekovic, Mladen
    Heinrich, Dennis
    Groppe, Sven
    Joseph, Jan Moritz
    Pionteck, Thilo
    2018 INTERNATIONAL CONFERENCE ON FIELD-PROGRAMMABLE TECHNOLOGY (FPT 2018), 2018, : 281 - 284
  • [45] Hardware-accelerated visual hull reconstruction and rendering
    Li, M
    Magnor, M
    Seidel, HP
    GRAPHICS INTERFACE 2003, PROCEEDING, 2003, : 65 - 71
  • [46] Hardware-accelerated adaptive EWA volume splatting
    Chen, W
    Ren, L
    Zwicker, M
    Pfister, H
    IEEE VISUALIZATION 2004, PROCEEEDINGS, 2004, : 67 - 74
  • [47] Hardware-Accelerated Cache Simulation for Multicore by FPGA
    Hung, Shih-Hao
    Ho, Yi-Mo
    Yeh, Chih-Wei
    Liu, Cheng-Yueh
    Lee, Chen-Pang
    PROCEEDINGS OF THE 2018 CONFERENCE ON RESEARCH IN ADAPTIVE AND CONVERGENT SYSTEMS (RACS 2018), 2018, : 231 - 236
  • [48] Protean: ADAPTIVE HARDWARE-ACCELERATED INTERMITTENT COMPUTING
    Bakar, Abu
    Goel, Rishabh
    de Winkel, Jasper
    Huang, Jason
    Ahmed, Saad
    Islam, Bashima
    Pawelczak, Przemyslaw
    Yildirim, Kasim Sinan
    Hester, Josiah
    GETMOBILE-MOBILE COMPUTING & COMMUNICATIONS REVIEW, 2023, 27 (01) : 5 - 10
  • [49] Transform coding for hardware-accelerated volume rendering
    Fout, Nathaniel
    Ma, Kwan-Liu
    IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2007, 13 (06) : 1600 - 1607
  • [50] Recent advances in hardware-accelerated volume rendering
    Ma, KL
    Lum, EB
    Muraki, S
    COMPUTERS & GRAPHICS-UK, 2003, 27 (05): : 725 - 734