Fast Motion Estimation Algorithm and Design for Real Time QFHD High Efficiency Video Coding

被引:47
作者
Jou, Shiaw-Yu [1 ]
Chang, Shan-Jung [1 ]
Chang, Tian-Sheuan [2 ]
机构
[1] PixArt, Hsinchu, Taiwan
[2] Natl Chiao Tung Univ, Dept Elect Engn, Hsinchu 30010, Taiwan
关键词
High Efficiency Video Coding (HEVC); motion estimation (ME); very-large-scale integration (VLSI) architecture; DIAMOND SEARCH ALGORITHM; ARCHITECTURE DESIGN; H.264/AVC; SINGLE;
D O I
10.1109/TCSVT.2015.2389472
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Motion estimation (ME) in the latest High Efficiency Video Coding standard adopts the quadtree coding structure and up to a 64 x 64 prediction unit (PU) size to improve the coding gain. However, these techniques also have serious design problems regarding the complexity, data dependency, external memory bandwidth, and on-chip buffer size compared with previous standards, especially for real-time ultrahigh-definition video coding. To solve these problems, this paper proposes an efficient ME design with a joint algorithm and architecture optimization. To reduce complexity, we propose a predictive integer ME (IME) algorithm that selects the most probable search directions and steps through a statistical analysis to reduce the number of search points by 90.5%. We also employ a PU size-dependent fractional ME (FME) algorithm to reduce the interpolation filtering by 62.4% compared with the reference software. To resolve the corresponding dependency, we cascade the IME and FME computations via interlaced scheduling and propose an early motion vector prediction candidate approach. We use this scheduling with a 16 x 16 processing unit to compute the partial matching cost of all PUs with the same 16 x 16 current block in an interlaced order and share their common reference block to reduce the on-chip buffer size and off-chip memory bandwidth. The bandwidth is further reduced by a cache with double Z scan indexed addressing to simplify the cache controller. Implementation with a Taiwan Semiconductor Manufacturing Company 90-nm CMOS process supports the real-time encoding of 4 K x 2 K at 60 frames/s operated at 270 MHz with 778.7k logic gates and 17.4 KB of on-chip memory.
引用
收藏
页码:1533 / 1544
页数:12
相关论文
共 34 条
  • [1] [Anonymous], 2013, JCT VC HM 9 0 REFERE
  • [2] [Anonymous], 2013, document Rec. ITU-T H.265
  • [3] Bjontegaard G., 2001, VCEGM33 JCTVC ITUT S
  • [4] Bossen F., 2013, JCTVCK1100
  • [5] HEVC Complexity and Implementation Analysis
    Bossen, Frank
    Bross, Benjamin
    Suehring, Karsten
    Flynn, David
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2012, 22 (12) : 1685 - 1696
  • [6] Chang J.-F., 2005, P 7 IEEE INT S MULT
  • [7] Analysis and architecture design of an HDTV720p 30 frames/s H.264/AVC encoder
    Chen, Tung-Chien
    Chien, Shao-Yi
    Huang, Yu-Wen
    Tsai, Chen-Han
    Chen, Ching-Yeh
    Chen, To-Wei
    Chen, Liang-Gee
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2006, 16 (06) : 673 - 688
  • [8] A novel cross-diamond search algorithm for fast block motion estimation
    Cheung, CH
    Po, LM
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2002, 12 (12) : 1168 - 1177
  • [9] Fast H.264 Encoding Based on Statistical Learning
    Chiang, Chen-Kuo
    Pan, Wei-Hau
    Hwang, Chiuan
    Zhuang, Shin-Shan
    Lai, Shang-Hong
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2011, 21 (09) : 1304 - 1315
  • [10] Fast coding unit decision method based on coding tree pruning for high efficiency video coding
    Choi, Kiho
    Jang, Euee S.
    [J]. OPTICAL ENGINEERING, 2012, 51 (03)