Fcd-cnn: FPGA-based CU depth decision for HEVC intra encoder using CNN

被引：2

作者：

Dehnavi, Hossein ^{[1
]}

Dehnavi, Mohammad ^{[1
]}

Klidbary, Sajad Haghzad ^{[2
]}

机构：

[1] Kermanshah Univ Technol, Energy Fac, Dept Elect Engn, Kermanshah, Iran

[2] Univ Zanjan, Dept Elect & Comp Engn, Zanjan, Iran

来源：

JOURNAL OF REAL-TIME IMAGE PROCESSING | 2024年 / 21卷 / 04期

关键词：

FPGA; Video compression; Hardware architecture; HEVC;

D O I：

10.1007/s11554-024-01487-9

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Video compression for storage and transmission has always been a focal point for researchers in the field of image processing. Their efforts aim to reduce the data volume required for video representation while maintaining its quality. HEVC is one of the efficient standards for video compression, receiving special attention due to the increasing demand for high-resolution videos. The main step in video compression involves dividing the coding unit (CU) blocks into smaller blocks that have a uniform texture. In traditional methods, The Discrete Cosine Transform (DCT) is applied, followed by the use of RDO for decision-making on partitioning. This paper presents a novel convolutional neural network (CNN) and its hardware implementation as an alternative to DCT, aimed at speeding up partitioning and reducing the hardware resources required. The proposed hardware utilizes an efficient and lightweight CNN to partition CUs with low hardware resources in real-time applications. This CNN is trained for different Quantization Parameters (QPs) and block sizes to prevent overfitting. Furthermore, the system's input size is fixed at 16x16\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$16\times 16$$\end{document}, and other input sizes are scaled to this dimension. Loop unrolling, data reuse, and resource sharing are applied in hardware implementation to save resources. The hardware architecture is fixed for all block sizes and QPs, and only the coefficients of the CNN are changed. In terms of compression quality, the proposed hardware achieves a 4.42%\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$4.42\%$$\end{document} BD-BR and -0.19\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-\,0.19$$\end{document} BD-PSNR compared to HM16.5. The proposed system can process 64x64\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$64\times 64$$\end{document} CU at 150 MHz and in 4914 clock cycles. The hardware resources utilized by the proposed system include 13,141 LUTs, 15,885 Flip-flops, 51 BRAMs, and 74 DSPs.

引用

页数：10

共 50 条

[21] Increasing Flexibility of FPGA-based CNN Accelerators with Dynamic Partial Reconfiguration
Irmak, Hasan
Ziener, Daniel
Alachiotis, Nikolaos
2021 31ST INTERNATIONAL CONFERENCE ON FIELD-PROGRAMMABLE LOGIC AND APPLICATIONS (FPL 2021), 2021, : 306 - 311
[22] FPGA-Based CNN for Real-Time UAV Tracking and Detection
Hobden, Peter
Srivastava, Saket
Nurellari, Edmond
FRONTIERS IN SPACE TECHNOLOGIES, 2022, 3
[23] Deep CNN Co-design for HEVC CU Partition Prediction on FPGA-SoC
Bouaafia, Soulef
Khemiri, Randa
Messaoud, Seifeddine
Sayadi, Fatma Ezahra
NEURAL PROCESSING LETTERS, 2022, 54 (04) : 3283 - 3301
[24] A Fast CU Decision Using Image Variance in HEVC Intra Coding
Nishikori, Taiki
Nakamura, Tomonobu
Yoshitome, Takeshi
Mishiba, Kazu
2013 IEEE SYMPOSIUM ON INDUSTRIAL ELECTRONICS & APPLICATIONS (ISIEA 2013), 2013, : 52 - 56
[25] Intra CTU depth decision for HEVC by using Neural Networks
Li Yanfen
Wang, Hanxiang
Dang, L. Minh
Islam, Khawar
Kim, Hae Kwang
INTERNATIONAL WORKSHOP ON ADVANCED IMAGING TECHNOLOGY (IWAIT) 2021, 2021, 11766
[26] Optimized FPGA-based Deep Learning Accelerator for Sparse CNN using High Bandwidth Memory
Jiang, Chao
Ojika, David
Patel, Bhavesh
Lam, Herman
2021 IEEE 29TH ANNUAL INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES (FCCM 2021), 2021, : 157 - 164
[27] FPGA-based CNN Processor with Filter-Wise-Optimized Bit Precision
Maki, Asuka
Miyashita, Daisuke
Nakata, Kengo
Tachibana, Fumihiko
Suzuki, Tomoya
Deguchi, Jun
2018 IEEE ASIAN SOLID-STATE CIRCUITS CONFERENCE (A-SSCC): PROCEEDINGS OF TECHNICAL PAPERS, 2018, : 47 - 50
[28] An FPGA-based online reconfigurable CNN edge computing device for object detection*,**
Wang, Yu
Liao, Yibing
Yang, Jiamei
Wang, Hui
Zhao, Yuxuan
Zhang, Chengyu
Xiao, Bende
Xu, Fei
Gao, Yifan
Xu, Mingzhu
Zheng, Jianbin
MICROELECTRONICS JOURNAL, 2023, 137
[29] Advantages and limitations of fully on-chip CNN FPGA-based hardware accelerator
Dinelli, Gianmarco
Meoni, Gabriele
Rapuano, Emilio
Fanucci, Luca
2020 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), 2020,
[30] SparkNoC: An energy-efficiency FPGA-based accelerator using optimized lightweight CNN for edge computing
Xia, Ming
Huang, Zunkai
Tian, Li
Wang, Hui
Chang, Victor
Zhu, Yongxin
Feng, Songlin
JOURNAL OF SYSTEMS ARCHITECTURE, 2021, 115

← 1 2 3 4 5 →