Analysis and architecture design of variable block-size motion estimation for H.264/AVC

被引:181
作者
Chen, CY [1 ]
Chien, SY
Huang, YW
Chen, TC
Wang, TC
Chen, LG
机构
[1] Natl Taiwan Univ, DSP IC Design Lab, Grad Inst Elect Engn, Taipei 10617, Taiwan
[2] Natl Taiwan Univ, Dept Elect Engn 2, Taipei 10617, Taiwan
[3] Chin Fong Machine Ind, Changhua 50445, Taiwan
关键词
block matching; H.264/AVC; motion estimation (ME); variable block size; very large scale integration (VLSI) architecture;
D O I
10.1109/TCSI.2005.858488
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Variable block-size motion estimation (VBSME) has become an important video coding technique, but it increases the difficulty of hardware design. In this paper, we use inter-/intralevel classification and various data flows to analyze the impact of supporting VBSME in different hardware architectures. Furthermore, we propose two hardware architectures that can support traditional fixed block-size motion estimation as well as VBSME with less chip area overhead compared to previous approaches. By broadcasting reference pixel rows and propagating partial sums of absolute differences (SADs), the first design has the fewer reference pixel registers and a shorter critical path. The second design utilizes a two-dimensional distortion array and one adder tree with the reference buffer that can maximize the data reuse between successive searching candidates. The first design is suitable for low resolution or a small search range, and the second design has advantages of supporting a high degree of parallelism and VBSME. Finally, we propose an eight-parallel SAD tree with a shared reference buffer for H.264/AVC integer motion estimation (IME). Its processing ability is eight times of the single SAD tree, but the reference buffer size is only doubled. Moreover, the most critical issue of H.264 IME, which is huge memory bandwidth, is overcome. We are able to save 99.9% off-chip memory bandwidth and 99.22% on-chip memory bandwidth. We demonstrate a 720-p, 30-fps solution at 108 MHz with 330.2k gate count and 208k bits on-chip memory.
引用
收藏
页码:578 / 593
页数:16
相关论文
共 26 条
[1]  
[Anonymous], 1981, P NAT TEL C NEW ORL
[2]  
[Anonymous], 144962 ISOIEC
[3]  
[Anonymous], 1998, VIDEO CODING LOW BIT
[4]   Fast full-search block matching [J].
Brünig, M ;
Niehsen, W .
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2001, 11 (02) :241-247
[5]   SCALABLE ARRAY ARCHITECTURE DESIGN FOR FULL SEARCH BLOCK MATCHING [J].
CHANG, SF ;
HWANG, JH ;
JEN, CW .
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 1995, 5 (04) :332-343
[6]  
Chen TC, 2004, 2004 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, VOL 2, PROCEEDINGS, P273
[7]   PARAMETERIZABLE VLSI ARCHITECTURES FOR THE FULL-SEARCH BLOCK-MATCHING ALGORITHM [J].
DEVOS, L ;
STEGHERR, M .
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS, 1989, 36 (10) :1309-1316
[8]   A low-power VLSI architecture for full-search block-matching motion estimation [J].
Do, VL ;
Yun, KY .
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 1998, 8 (04) :393-398
[9]   A multilevel successive elimination algorithm for block matching motion estimation [J].
Gao, XQ ;
Duanmu, CJ ;
Zou, CR .
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2000, 9 (03) :501-504
[10]  
HE Z, 1997, P IEEE INT S CIRC SY, P2809