Trade-Offs in Loop Transformations

被引:3
作者
Palkovic, Martin [1 ]
Catthoor, Francky [1 ]
Corporaal, Henk [2 ]
机构
[1] IMEC, SSET, B-3001 Louvain, Belgium
[2] TU Eindhoven, NL-5612 AZ Eindhoven, Netherlands
关键词
Algorithms; Design; Data transfer and storage exploration; optimization; loop transformations; cost components; trade-offs;
D O I
10.1145/1497561.1497565
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Nowadays, multimedia systems deal with huge amounts of memory accesses and large memory footprints. To alleviate the impact of these accesses and reduce the memory footprint, high-level memory exploration and optimization techniques have been proposed. These techniques try to more efficiently utilize the memory hierarchy. An important step in these optimization techniques are loop transformations (LT). They have a crucial effect on later data memory footprint optimization steps and code generation. However, the state-of-the-art work has focused only on individual objectives. The main one in literature involves improving the locality of data accesses, and thus reducing the data memory footprint. It does not consider the trade-offs in the LT step in relation to successive optimization steps. Therefore, it is not globally efficient in mapping the application on the target platform. In this article we will discuss several trade-offs during the loop transformations. To our knowledge, we are the first ones considering these global trade-offs. Previous work always gave mostly one solution, having the best locality and thus the optimized memory footprint, even though some research in two-dimensional trade-offs in this area exists as well. We start from this state-of-the-art solution with minimal footprint. We show that by sacrificing the footprint, we can obtain gains in data reuse (crucial for energy reduction) and reduce the control-flow complexity. We demonstrate our approach on a real-life application, namely the QSDPCM video coder. At the end, we show that considering trade-offs for this application leads to 16% energy reduction in a two-layer memory subsystem and 10% cycle reduction on the ARM platform.
引用
收藏
页数:30
相关论文
共 49 条
[1]  
ABSAR J, 2003, CALL INSTANCE BASED
[2]   AUTOMATIC TRANSLATION OF FORTRAN PROGRAMS TO VECTOR FORM [J].
ALLEN, R ;
KENNEDY, K .
ACM TRANSACTIONS ON PROGRAMMING LANGUAGES AND SYSTEMS, 1987, 9 (04) :491-542
[3]   Automatic data mapping of signal processing applications [J].
Ancourt, C ;
Barthou, D ;
Guettier, C ;
Irigoin, F ;
Jeannet, B ;
Jourdan, J ;
Mattioli, J .
IEEE INTERNATIONAL CONFERENCE ON APPLICATION-SPECIFIC SYSTEMS, ARCHITECTURES AND PROCESSORS, PROCEEDINGS, 1997, :350-362
[4]  
Banakar R, 2002, CODES 2002: PROCEEDINGS OF THE TENTH INTERNATIONAL SYMPOSIUM ON HARDWARE/SOFTWARE CODESIGN, P73, DOI 10.1109/CODES.2002.1003604
[5]   AUTOMATIC PROGRAM PARALLELIZATION [J].
BANERJEE, U ;
EIGENMANN, R ;
NICOLAU, A ;
PADUA, DA .
PROCEEDINGS OF THE IEEE, 1993, 81 (02) :211-243
[6]   Code generation in the polyhedral model is easier than you think [J].
Bastoul, C .
13TH INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURE AND COMPILATION TECHNIQUES, PROCEEDINGS, 2004, :7-16
[7]  
Bastoul C., 2003, International Workshop on Languages and Compilers for Parallel Computing, P209
[8]  
Brockmeyer E, 2003, DESIGN, AUTOMATION AND TEST IN EUROPE CONFERENCE AND EXHIBITION, PROCEEDINGS, P1070
[9]  
Carr S., 1994, SIGPLAN Notices, V29, P252, DOI 10.1145/195470.195557
[10]  
Catthoor F., 2002, DATA ACCESS STORAGE