Iris: Automatic Generation of Efficient Data Layouts for High Bandwidth Utilization

被引:2
作者
Soldavini, Stephanie [1 ]
Sciuto, Donatella [1 ]
Pilato, Christian [1 ]
机构
[1] Politecn Milan, Milan, Italy
来源
2023 28TH ASIA AND SOUTH PACIFIC DESIGN AUTOMATION CONFERENCE, ASP-DAC | 2023年
基金
欧盟地平线“2020”;
关键词
D O I
10.1145/3566097.3567892
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Optimizing data movements is becoming one of the biggest challenges in heterogeneous computing to cope with data deluge and, consequently, big data applications. When creating specialized accelerators, modern high-level synthesis (HLS) tools are increasingly efficient in optimizing the computational aspects, but data transfers have not been adequately improved. To combat this, novel architectures such as High-Bandwidth Memory with wider data busses have been developed so that more data can be transferred in parallel. Designers must tailor their hardware/software interfaces to fully exploit the available bandwidth. HLS tools can automate this process, but the designer must follow strict coding-style rules. If the bus width is not evenly divisible by the data width (e.g., when using custom-precision data types) or if the arrays are not power-of-two length, the HLS-generated accelerator will likely not fully utilize the available bandwidth, demanding even more manual effort from the designer. We propose a methodology to automatically find and implement a data layout that, when streamed between memory and an accelerator, uses a higher percentage of the available bandwidth than a naive or HLS-optimized design. We borrow concepts from multiprocessor scheduling to achieve such high efficiency.
引用
收藏
页码:172 / 177
页数:6
相关论文
共 50 条
[11]   Automatic Iris Mask Refinement for High Performance Iris Recognition [J].
Li, Yung-hui ;
Savvides, Marios .
CIB: 2009 IEEE WORKSHOP ON COMPUTATIONAL INTELLIGENCE IN BIOMETRICS: THEORY, ALGORITHMS, AND APPLICATIONS, 2009, :52-58
[12]   Efficient data mappings for parity-declustered data layouts [J].
Schwabe, EJ ;
Sutherland, IM .
THEORETICAL COMPUTER SCIENCE, 2004, 325 (03) :391-407
[13]   Efficient bandwidth utilization for downloading web pages [J].
Kundu, Anirban .
International Journal of Computers and Applications, 2014, 36 (01) :1-6
[14]   Elastic reservations for efficient bandwidth utilization in LambdaGrids [J].
Naiksatam, Sumit ;
Figueira, Silvia .
FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2007, 23 (01) :1-22
[15]   AN ALGORITHM FOR THE EFFICIENT UTILIZATION OF BANDWIDTH IN THE SLOTTED RING [J].
KAMAL, AE .
IEEE TRANSACTIONS ON COMPUTERS, 1992, 41 (12) :1620-1627
[16]   A linear algebra framework for automatic determination of optimal data layouts [J].
Kandemir, M ;
Shenoy, N ;
Banerjee, P ;
Ramanujam, J .
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 1999, 10 (02) :115-135
[17]   Gaps and requirements for automatic generation of space layouts with optimised energy performance [J].
Du, Tiantian ;
Turrin, Michela ;
Jansen, Sabine ;
van den Dobbelsteen, Andy ;
Fang, Jian .
AUTOMATION IN CONSTRUCTION, 2020, 116
[18]   An efficient automatic iris image acquisitioni and preprocessing system [J].
Fan, Kefeng ;
Pei, Qingqi ;
Mo, Wei ;
Zhao, Xinhua ;
Sun, Qifeng .
IEEE ICMA 2006: PROCEEDING OF THE 2006 IEEE INTERNATIONAL CONFERENCE ON MECHATRONICS AND AUTOMATION, VOLS 1-3, PROCEEDINGS, 2006, :1779-+
[19]   Automatic generation of minimal mismatch layouts for capacitors based on simulated annealing [J].
Di, L ;
Dong, SQ ;
Hong, XL .
Proceedings of the 8th Joint Conference on Information Sciences, Vols 1-3, 2005, :237-240
[20]   Linear algebra framework for automatic determination of optimal data layouts [J].
Syracuse Univ, Syracuse, United States .
IEEE Trans Parallel Distrib Syst, 2 (115-134)