Fast Linear Equation Solving Algorithm and Its Pipelined Hardware Architecture Design for VVC Affine Motion Estimation

被引：1

作者：

Sheng, Qinghua ^{[1
]}

Chen, Hongzhao ^{[1
]}

Lai, Changcai ^{[1
]}

Huang, Xiaofang ^{[2
]}

Liu, Yuanyuan ^{[1
]}

Huang, Xiaofeng ^{[3
,4
]}

Yin, Haibing ^{[3
,4
]}

机构：

[1] Hangzhou Dianzi Univ, Sch Elect & Informat Engn, Hangzhou 310018, Peoples R China

[2] Hangzhou Dianzi Univ Informat Engn Coll, Sch Elect Engn, Hangzhou 311305, Peoples R China

[3] Hangzhou Dianzi Univ, Sch Commun Engn, Hangzhou 310018, Peoples R China

[4] Peking Univ, Adv Inst Informat Technol, Hangzhou 311215, Peoples R China

来源：

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY | 2024年 / 34卷 / 11期

基金：

中国国家自然科学基金;

关键词：

VVC; linear equation solving; hardware implementation; affine motion estimation; FPGA; VIDEO;

D O I：

10.1109/TCSVT.2024.3414422

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

The latest video coding standard, versatile video coding (VVC), was developed to achieve higher video compression efficiency and support more media applications than its predecessor, high-efficiency video coding (HEVC). To address nontranslational motion, such as rotation and zooming, affine motion compensation has been employed in VVC during interframe prediction. However, the complexity increases significantly due to a large number of linear equation solving steps during affine motion estimation (AME). To address the problem, this paper proposes a fast linear equation solving algorithm and an accompanying pipelined hardware architecture design. To the best of our knowledge, our work is the first attempt to address the hardware architecture design of the linear equation solving algorithm in affine mode. First, an integer-based division-free algorithm (I-DFA) is proposed to achieve fast equation solving. Then, a novel dynamic scaling algorithm is proposed to compensate for integer computation errors due to overflow problems. Finally, a pipelined and interleaved hardware architecture is proposed to minimize the number of iteration clock cycles and improve the throughput. The proposed algorithm achieves average time savings of 5.3% and 5.7% with only 0.03% and 0.07% increase in the Bjontegaard delta bit rate (BD-BR) under low-delay P (LDP) and random access (RA) configurations, respectively. The proposed hardware architecture can solve 16.7M six-parameter affine systems of linear equations per second under a working frequency of 100MHz, which represents a 21x improvement compared to the existing methods.

引用

页码：11229 / 11240

页数：12

共 41 条

[1] Ali KM, 2015, 2015 5TH INTERNATIONAL CONFERENCE ON ENERGY AWARE COMPUTING SYSTEMS & APPLICATIONS (ICEAC)
[2] High-Precision Priority Encoder Based Integer Division Algorithm
Ammar, Ahmed
Drennen, Hayden
Hassan, Firas
[J]. 2021 IEEE INTERNATIONAL MIDWEST SYMPOSIUM ON CIRCUITS AND SYSTEMS (MWSCAS), 2021, : 494 - 497
[3] BAREISS EH, 1968, MATH COMPUT, V22, P565
[4] Bjontegaard G., 2001, document VCEG-M33
[5] Bossen F., 2019, 14 M GEN SWITZ
[6] Boyce J., 2018, P JVET M
[7] Overview of the Versatile Video Coding (VVC) Standard and its Applications
Bross, Benjamin
Wang, Ye-Kui
Ye, Yan
Liu, Shan
Chen, Jianle
Sullivan, Gary J.
Ohm, Jens-Rainer
[J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2021, 31 (10) : 3736 - 3764
[8] Developments in International Video Coding Standardization After AVC, With an Overview of Versatile Video Coding (VVC)
Bross, Benjamin
Chen, Jianle
Ohm, Jens-Rainer
Sullivan, Gary J.
Wang, Ye-Kui
[J]. PROCEEDINGS OF THE IEEE, 2021, 109 (09) : 1463 - 1493
[9] Hardware-efficient algorithm and architecture design with memory and complexity reduction for semi-global matching
Chang, Cheng-Tsung
Chen, Pin-Wei
Chin, Wen-Long
Chou, Shih-Hsiang
Yang, Yu-Hua
[J]. INTEGRATION-THE VLSI JOURNAL, 2023, 92 : 99 - 105
[10] Chen J., 2020, TEL OCT

← 1 2 3 4 5 →