Intra Prediction Method for Depth Video Coding by Block Clustering through Deep Learning

被引：0

作者：

Lee, Dong-seok ^{[1
]}

Kwon, Soon-kak ^{[2
]}

机构：

[1] Dong Eui Univ, AI Grand ICT Res Ctr, Busan 47340, South Korea

[2] Dong Eui Univ, Dept Comp Software Engn, Busan 47340, South Korea

来源：

SENSORS | 2022年 / 22卷 / 24期

基金：

新加坡国家研究基金会;

关键词：

intra prediction; depth video coding; deep learning; 1D CNN; clustering; COMPRESSION; NETWORK;

D O I：

10.3390/s22249656

中图分类号：

O65 [分析化学];

学科分类号：

070302 ; 081704 ;

摘要：

In this paper, we propose an intra-picture prediction method for depth video by a block clustering through a neural network. The proposed method solves a problem that the block that has two or more clusters drops the prediction performance of the intra prediction for depth video. The proposed neural network consists of both a spatial feature prediction network and a clustering network. The spatial feature prediction network utilizes spatial features in vertical and horizontal directions. The network contains a 1D CNN layer and a fully connected layer. The 1D CNN layer extracts the spatial features for a vertical direction and a horizontal direction from a top block and a left block of the reference pixels, respectively. 1D CNN is designed to handle time-series data, but it can also be applied to find the spatial features by regarding a pixel order in a certain direction as a timestamp. The fully connected layer predicts the spatial features of the block to be coded through the extracted features. The clustering network finds clusters from the spatial features which are the outputs of the spatial feature prediction network. The network consists of 4 CNN layers. The first 3 CNN layers combine two spatial features in the vertical and horizontal directions. The last layer outputs the probabilities that pixels belong to the clusters. The pixels of the block are predicted by the representative values of the clusters that are the average of the reference pixels belonging to the clusters. For the intra prediction for various block sizes, the block is scaled to the size of the network input. The prediction result through the proposed network is scaled back to the original size. In network training, the mean square error is used as a loss function between the original block and the predicted block. A penalty for output values far from both ends is introduced to the loss function for clear network clustering. In the simulation results, the bit rate is saved by up to 12.45% under the same distortion condition compared with the latest video coding standard.

引用

页数：16

共 41 条

[1] [Anonymous], Versatile video coding (vvc) reference software vtm
[2] Balle J., 2017, P INT C LEARNING REP
[3] MPEG Immersive Video Coding Standard
Boyce, Jill M.
Dore, Renaud
Dziembowski, Adrian
Fleureau, Julien
Jung, Joel
Kroon, Bart
Salahieh, Basel
Vadakital, Vinod Kumar Malamal
Yu, Lu
[J]. PROCEEDINGS OF THE IEEE, 2021, 109 (09) : 1521 - 1536
[4] Intra-Frame Coding Using a Conditional Autoencoder
Brand, Fabian
Seiler, Juergen
Kaup, Andre
[J]. IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2021, 15 (02) : 354 - 365
[5] Overview of the Versatile Video Coding (VVC) Standard and its Applications
Bross, Benjamin
Wang, Ye-Kui
Ye, Yan
Liu, Shan
Chen, Jianle
Sullivan, Gary J.
Ohm, Jens-Rainer
[J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2021, 31 (10) : 3736 - 3764
[6] Cheng ZX, 2018, PICT COD SYMP, P253, DOI 10.1109/PCS.2018.8456308
[7] Choi S, 2015, PROC CVPR IEEE, P5556, DOI 10.1109/CVPR.2015.7299195
[8] Motion-Compensated Compression of Dynamic Voxelized Point Clouds
de Queiroz, Ricardo L.
Chou, Philip A.
[J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2017, 26 (08) : 3886 - 3895
[9] Feng D, 2018, IEEE INT C INTELL TR, P3266, DOI 10.1109/ITSC.2018.8569814
[10] Garcia DC, 2018, IEEE IMAGE PROC, P1807, DOI 10.1109/ICIP.2018.8451802

← 1 2 3 4 5 →