Fast Linear Equation Solving Algorithm and Its Pipelined Hardware Architecture Design for VVC Affine Motion Estimation

被引:1
作者
Sheng, Qinghua [1 ]
Chen, Hongzhao [1 ]
Lai, Changcai [1 ]
Huang, Xiaofang [2 ]
Liu, Yuanyuan [1 ]
Huang, Xiaofeng [3 ,4 ]
Yin, Haibing [3 ,4 ]
机构
[1] Hangzhou Dianzi Univ, Sch Elect & Informat Engn, Hangzhou 310018, Peoples R China
[2] Hangzhou Dianzi Univ Informat Engn Coll, Sch Elect Engn, Hangzhou 311305, Peoples R China
[3] Hangzhou Dianzi Univ, Sch Commun Engn, Hangzhou 310018, Peoples R China
[4] Peking Univ, Adv Inst Informat Technol, Hangzhou 311215, Peoples R China
基金
中国国家自然科学基金;
关键词
VVC; linear equation solving; hardware implementation; affine motion estimation; FPGA; VIDEO;
D O I
10.1109/TCSVT.2024.3414422
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
The latest video coding standard, versatile video coding (VVC), was developed to achieve higher video compression efficiency and support more media applications than its predecessor, high-efficiency video coding (HEVC). To address nontranslational motion, such as rotation and zooming, affine motion compensation has been employed in VVC during interframe prediction. However, the complexity increases significantly due to a large number of linear equation solving steps during affine motion estimation (AME). To address the problem, this paper proposes a fast linear equation solving algorithm and an accompanying pipelined hardware architecture design. To the best of our knowledge, our work is the first attempt to address the hardware architecture design of the linear equation solving algorithm in affine mode. First, an integer-based division-free algorithm (I-DFA) is proposed to achieve fast equation solving. Then, a novel dynamic scaling algorithm is proposed to compensate for integer computation errors due to overflow problems. Finally, a pipelined and interleaved hardware architecture is proposed to minimize the number of iteration clock cycles and improve the throughput. The proposed algorithm achieves average time savings of 5.3% and 5.7% with only 0.03% and 0.07% increase in the Bjontegaard delta bit rate (BD-BR) under low-delay P (LDP) and random access (RA) configurations, respectively. The proposed hardware architecture can solve 16.7M six-parameter affine systems of linear equations per second under a working frequency of 100MHz, which represents a 21x improvement compared to the existing methods.
引用
收藏
页码:11229 / 11240
页数:12
相关论文
共 41 条
  • [1] Ali KM, 2015, 2015 5TH INTERNATIONAL CONFERENCE ON ENERGY AWARE COMPUTING SYSTEMS & APPLICATIONS (ICEAC)
  • [2] High-Precision Priority Encoder Based Integer Division Algorithm
    Ammar, Ahmed
    Drennen, Hayden
    Hassan, Firas
    [J]. 2021 IEEE INTERNATIONAL MIDWEST SYMPOSIUM ON CIRCUITS AND SYSTEMS (MWSCAS), 2021, : 494 - 497
  • [3] BAREISS EH, 1968, MATH COMPUT, V22, P565
  • [4] Bjontegaard G., 2001, document VCEG-M33
  • [5] Bossen F., 2019, 14 M GEN SWITZ
  • [6] Boyce J., 2018, P JVET M
  • [7] Overview of the Versatile Video Coding (VVC) Standard and its Applications
    Bross, Benjamin
    Wang, Ye-Kui
    Ye, Yan
    Liu, Shan
    Chen, Jianle
    Sullivan, Gary J.
    Ohm, Jens-Rainer
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2021, 31 (10) : 3736 - 3764
  • [8] Developments in International Video Coding Standardization After AVC, With an Overview of Versatile Video Coding (VVC)
    Bross, Benjamin
    Chen, Jianle
    Ohm, Jens-Rainer
    Sullivan, Gary J.
    Wang, Ye-Kui
    [J]. PROCEEDINGS OF THE IEEE, 2021, 109 (09) : 1463 - 1493
  • [9] Hardware-efficient algorithm and architecture design with memory and complexity reduction for semi-global matching
    Chang, Cheng-Tsung
    Chen, Pin-Wei
    Chin, Wen-Long
    Chou, Shih-Hsiang
    Yang, Yu-Hua
    [J]. INTEGRATION-THE VLSI JOURNAL, 2023, 92 : 99 - 105
  • [10] Chen J., 2020, TEL OCT