Deep Learning and Bidirectional Optical Flow Based Viewport Predictions for 360° Video Coding

被引：3

作者：

Adhuran, Jayasingam ^{[1
]}

Kulupana, Gosala ^{[1
]}

Fernando, Anil ^{[2
]}

机构：

[1] Univ Surrey, Ctr Vis Speech & Signal Proc, Guildford GU2 7XH, England

[2] Univ Strathclyde, Dept Comp & Informat Sci, Glasgow G1 1XQ, Scotland

来源：

IEEE ACCESS | 2022年 / 10卷

关键词：

360 degrees video; perceptual coding; Regions of Interest; viewport prediction; Versatile Video Coding; VIRTUAL-REALITY; OPTIMIZATION;

D O I：

10.1109/ACCESS.2022.3219861

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

The rapid development of virtual reality applications continues to urge better compression of 360 degrees videos owing to the large volume of content. These videos are typically converted to 2-D formats using various projection techniques in order to benefit from ad-hoc coding tools designed to support conventional 2-D video compression. Although recently emerged video coding standard, Versatile Video Coding (VVC) introduces 360 degrees video specific coding tools, it fails to prioritize the user observed regions in 360 degrees videos, represented by the rectilinear images called the viewports. This leads to the encoding of redundant regions in the video frames, escalating the bit rate cost of the videos. In response to this issue, this paper proposes a novel 360 degrees video coding framework for VVC which exploits user observed viewport information to alleviate pixel redundancy in 360 degrees videos. In this regard, bidirectional optical flow, Gaussian filter and Spherical Convolutional Neural Networks (Spherical CNN) are deployed to extract perceptual features and predict user observed viewports. By appropriately fusing the predicted viewports on the 2-D projected 360 degrees video frames, a novel Regions of Interest (ROI) aware weightmap is developed which can be used to mask the source video and introduce adaptive changes to the Lagrange and quantization parameters in VVC. Comprehensive experiments conducted in the context of VVC Test Model (VTM) 7.0 show that the proposed framework can improve bitrate reduction, achieving an average bitrate saving of 5.85% and up to 17.15% at the same perceptual quality which is measured using Viewport Peak Signal-To-Noise Ratio (VPSNR).

引用

页码：118380 / 118396

页数：17

共 51 条

[1]

Abbas A., 2017, JVETF0036, V31

[2] Multiple Quantization Parameter Optimization in Versatile Video Coding for 360° Videos [J].

Adhuran, Jayasingam ;

Kulupana, Gosala ;

Galkandage, Chathura ;

Fernando, Anil .

IEEE TRANSACTIONS ON CONSUMER ELECTRONICS, 2020, 66 (03) :213-222

[3]

Akula SN., 2017, Document, JVET-G0156

[4]

Bjontegaard G., 2001, ITU T SG16 Q 6

[5] Developments in International Video Coding Standardization After AVC, With an Overview of Versatile Video Coding (VVC) [J].

Bross, Benjamin ;

Chen, Jianle ;

Ohm, Jens-Rainer ;

Sullivan, Gary J. ;

Wang, Ye-Kui .

PROCEEDINGS OF THE IEEE, 2021, 109 (09) :1463-1493

[6]

Budagavi M, 2015, IEEE IMAGE PROC, P750, DOI 10.1109/ICIP.2015.7350899

[7]

Carreira J, 2020, IEEE IMAGE PROC, P3398, DOI 10.1109/ICIP40778.2020.9190732

[8] Viewport-Adaptive Scalable Multi-User Virtual Reality Mobile-Edge Streaming [J].

Chakareski, Jacob .

IEEE TRANSACTIONS ON IMAGE PROCESSING, 2020, 29 :6330-6342

[9]

Chakarothai Jerdvisanop, 2018, 2018 IEEE International Workshop on Electromagnetics: Applications and Student Innovation Competition (iWEM), DOI 10.1109/iWEM.2018.8536621

[10]

Chen Y.-W., 2018, U.S. Patent, Patent No. 15861515

← 1 2 3 4 5 6 →