Custom parallel caching schemes for hardware-accelerated image compression

被引:6
|
作者
Ang, Su-Shin [1 ]
Constantinides, George A. [1 ]
Luk, Wayne [2 ]
Cheung, Peter Y. K. [1 ]
机构
[1] Univ London Imperial Coll Sci Technol & Med, Dept Elect & Elect Engn, London SW7 2AZ, England
[2] Univ London Imperial Coll Sci Technol & Med, Dept Comp, London SW7 2BZ, England
基金
英国工程与自然科学研究理事会;
关键词
Cache; Scratchpad; Data re-use; Arbitration; Hardware;
D O I
10.1007/s11554-008-0082-0
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In an effort to achieve lower bandwidth requirements, video compression algorithms have become increasingly complex. Consequently, the deployment of these algorithms on field programmable gate arrays (FPGAs) is becoming increasingly desirable, because of the computational parallelism on these platforms as well as the measure of flexibility afforded to designers. Typically, video data are stored in large and slow external memory arrays, but the impact of the memory access bottleneck may be reduced by buffering frequently used data in fast on-chip memories. The order of the memory accesses, resulting from many compression algorithms are dependent on the input data (Jain in Proceedings of the IEEE, pp. 349-389, 1981). These data-dependent memory accesses complicate the exploitation of data re-use, and subsequently reduce the extent to which an application may be accelerated. In this paper, we present a hybrid memory sub-system which is able to capture data re-use effectively in spite of data-dependent memory accesses. This memory sub-system is made up of a custom parallel cache and a scratchpad memory. Further, the framework is capable of exploiting 2D spatial locality, which is frequently exhibited in the access patterns of image processing applications. In a case study involving the quad-tree structured pulse code modulation (QSDPCM) application, the impact of data dependence on memory accesses is shown to be significant. In comparison with an implementation which only employs an SPM, performance improvements of up to 1.7x and 1.4x are observed through actual implementation on two modern FPGA platforms. These performance improvements are more pronounced for image sequences exhibiting greater inter-frame movements. In addition, reductions of on-chip memory resources by up to 3.2x are achievable using this framework. These results indicate that, on custom hardware platforms, there is substantial scope for improvement in the capture of data re-use when memory accesses are data dependent.
引用
收藏
页码:289 / 302
页数:14
相关论文
共 50 条
  • [31] Realistic, hardware-accelerated shading and lighting
    Heidrich, W
    Seidel, HP
    SIGGRAPH 99 CONFERENCE PROCEEDINGS, 1999, : 171 - 178
  • [32] Hardware-Accelerated Network Control Planes
    Molero, Edgar Costa
    Vissicchio, Stefano
    Vanbever, Laurent
    HOTNETS-XVII: PROCEEDINGS OF THE 2018 ACM WORKSHOP ON HOT TOPICS IN NETWORKS, 2018, : 120 - 126
  • [33] A Hardware-Accelerated Approach to Chaotic Image Encryption: LTB Map and FPGA Implementation
    Yamni, Mohamed
    Daoui, Achraf
    Plawiak, Pawel
    Alfarraj, Osama
    Abd El-Latif, Ahmed A.
    IEEE ACCESS, 2024, 12 : 103921 - 103940
  • [34] Parallel and Pipelined Filter Operator for Hardware-Accelerated Operator Graphs in Semantic Web Databases
    Werner, Stefan
    Heinrich, Dennis
    Stelzner, Marc
    Groppe, Sven
    Backasch, Rico
    Pionteck, Thilo
    2014 IEEE INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION TECHNOLOGY (CIT), 2014, : 539 - 546
  • [35] Improving Performance and Lifetime of Solid-State Drives Using Hardware-Accelerated Compression
    Lee, Sungjin
    Park, Jihoon
    Fleming, Kermin
    Arvind
    Kim, Jihong
    IEEE TRANSACTIONS ON CONSUMER ELECTRONICS, 2011, 57 (04) : 1732 - 1739
  • [36] Hardware-accelerated protein identification for mass spectrometry
    Alex, AT
    Dumontier, M
    Rose, JS
    Hogue, CWV
    RAPID COMMUNICATIONS IN MASS SPECTROMETRY, 2005, 19 (06) : 833 - 837
  • [37] Speech Recognition and Understanding on Hardware-Accelerated DSP
    Stemmer, Georg
    Georges, Munir
    Hofer, Joachim
    Rozen, Piotr
    Bauer, Josef
    Nowicki, Jakub
    Bocklet, Tobias
    Colett, Hannah R.
    Falik, Ohad
    Deisher, Michael
    Downing, Sylvia J.
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 2036 - 2037
  • [38] Dual Streaming for Hardware-Accelerated Ray Tracing
    Shkurko, Konstantin
    Grant, Tim
    Kopta, Daniel
    Mallett, Ian
    Yuksel, Cem
    Brunvand, Erik
    HPG '17: PROCEEDINGS OF HIGH PERFORMANCE GRAPHICS, 2017,
  • [39] Hardware-accelerated dynamic clustering of virtualcrowd members
    Haciomeroglu, Murat
    Ozcan, Cumhur Yigit
    Barut, Oner
    Seckin, Levent
    Sever, Hayri
    COMPUTER ANIMATION AND VIRTUAL WORLDS, 2013, 24 (02) : 143 - 153
  • [40] Improved hardware-accelerated visual hull rendering
    Li, M
    Magnor, M
    Seidel, HP
    VISION, MODELING, AND VISUALIZATION 2003, 2003, : 151 - +