Optimizing compiler for shared-memory multiple SIMD architecture

被引:0
作者
Zhang, Weihua [1 ,2 ]
Qian, Xinglong [1 ]
Wang, Ye [1 ]
Zang, Binyu [1 ]
Zhu, Chuanqi [1 ]
机构
[1] Fudan Univ, Parallel Proc Inst, Shanghai, Peoples R China
[2] Chinese Acad Sci, Inst Comp Technol, Architect Key Lab, Beijing 100864, Peoples R China
关键词
algorithms; performance; optimization; shared memory; multiple SIMD; locality; replication;
D O I
10.1145/1159974.1134679
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
With the rapid growth of multimedia and game, these applications put more and more pressure on the processing ability of modern processors. Multiple SIMD architecture is widely used in multimedia processing field as a multimedia accelerator. With the consideration of power consumption and chip size, shared memory multiple SIMD architecture is mainly used in embedded SOCs. In order to further fit mobile environment, there is the constraint of limited register number as well. Although shared memory multiple SIMD architecture simplify the chip design, these constraints are the major obstacles to map the real multimedia applications to these architectures. Until now, to our best knowledge, there is little research on the optimizing techniques for shared memory multiple SIMD architecture. In this paper, we present a compiler framework, which aims at automatically generating high performance codes for shared memory multiple SIMD architecture. In this framework, we reduce the competition of shared data bus through increasing the register locality, improve the utilization of data bus by read-only data vector replication and solve the problem of limited register number through a resource allocation algorithm. The framework also handlers the issues concerning on data transformation. As the experimental results shown, this framework is successful in mapping real multimedia applications to shared memory multiple SIMD architecture. It leads to an average speedup by a factor of 3.19 and an average utilization of SM-SIMD architecture with 8 SIMD units by a factor of 52.6%.
引用
收藏
页码:199 / 208
页数:10
相关论文
共 22 条
[1]  
ANDERSON J, 1993, P SIGPLAN 93 C PROGR, P112
[2]  
ANDERSON J, 1997, THESIS STANFORD U ST
[3]  
[Anonymous], 41 DES AUT STUD DES
[4]  
EICHENBERGER A, 2005, 14 INT C PAR ARCH CO
[5]  
GAYLES ES, 2000, IEEE T VERY LARGE SC, V8
[6]  
HAMMES JP, 2001, SA C LANGUAGE
[7]  
HOFSTEE HP, 2005, 11 INT C HIGH PERF C
[8]   VLSI Architecture for Block-Matching Motion Estimation Algorithm [J].
Hsieh, Chaur-Heh ;
Lin, Ting-Pang .
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 1992, 2 (02) :169-175
[9]  
JIANG WH, 2005, 15 INT C COMP CONSTR
[10]  
KANDEMIR M, P 1998 INT C PAR ARC