Multi-Stage Spatial and Frequency Feature Fusion using Transformer in CNN-Based In-Loop Filter for VVC

被引：4

作者：

Kathariya, Birendra ^{[1
]}

Li, Zhu ^{[1
]}

Wang, Hongtao ^{[2
]}

Coban, Mohammad ^{[2
]}

机构：

[1] Univ Missouri, Kansas City, MO 64110 USA

[2] Qualcomm Technol Inc, San Diego, CA USA

来源：

2022 PICTURE CODING SYMPOSIUM (PCS) | 2022年

关键词：

Versatile Video Coding (VVC); In-Loop Filter; Discrete Cosine Transform (DCT); Convolutional Neural Network; Transformer;

D O I：

10.1109/PCS56426.2022.10017998

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Versatile Video Coding (VVC)/H.266 is a video coding successor to High Efficiency Video Coding (HEVC)/H.255 and Advanced Video Coding (AVC)/H.264 with significant technical and coding improvement. Nonetheless, it follows the conventional block-based hybrid video coding scheme similar to its predecessors. The consequence is, that the reconstructed picture contains compression artifacts. VVC, by default, has in-loop filters to correct the deformities but these handcrafted filters offer suboptimal performance. In this work, we designed a novel convolutional neural network (CNN) to replace the inbuilt in-loop filter of VVC. The proposed CNN-based in-loop filter utilizes a modified Spectral-wise Multi-Head Self-Attention (S-MSA) layer of Multi-stage Spectral-wise Transformer (MST++) at multiple stages to fuse spatial and frequency-decomposed features extracted from pixel and its discrete-cosine-transform (DCT) applied input respectively. We named the proposed network MSTFNet where the first three letters represent MST++ and F stands for fusion. Because of the multi-stage feature fusion operation, the proposed CNN acts as a powerful learned in-loop filter that significantly outperforms previous methods. Our experimental results show that the proposed method can achieve coding improvements up to 10.31% on average Bjontegaard Delta (BD)-Bitrate savings under all-intra (AI) configurations for the luma (Y) component.

引用

页码：373 / 377

页数：5

共 24 条

[1] [Anonymous], 2018, P IEEE C COMP VIS PA
[2] Blanch MG, 2020, IEEE IMAGE PROC, P783, DOI [10.1109/ICIP40778.2020.9191050, 10.1109/icip40778.2020.9191050]
[3] Overview of the Versatile Video Coding (VVC) Standard and its Applications
Bross, Benjamin
Wang, Ye-Kui
Ye, Yan
Liu, Shan
Chen, Jianle
Sullivan, Gary J.
Ohm, Jens-Rainer
[J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2021, 31 (10) : 3736 - 3764
[4] MST plus plus : Multi-stage Spectral-wise Transformer for Efficient Spectral Reconstruction
Cai, Yuanhao
Lin, Jing
Lin, Zudi
Wang, Haoqian
Zhang, Yulun
Pfister, Hanspeter
Timofte, Radu
Van Gool, Luc
[J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2022, 2022, : 744 - 754
[5] Dai Y., 2019, DOCUMENT JVET M0510
[6] A Convolutional Neural Network Approach for Post-Processing in HEVC Intra Coding
Dai, Yuanying
Liu, Dong
Wu, Feng
[J]. MULTIMEDIA MODELING (MMM 2017), PT I, 2017, 10132 : 28 - 39
[7] Compression Artifacts Reduction by a Deep Convolutional Network
Dong, Chao
Deng, Yubin
Loy, Chen Change
Tang, Xiaoou
[J]. 2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, : 576 - 584
[8] Image Super-Resolution Using Deep Convolutional Networks
Dong, Chao
Loy, Chen Change
He, Kaiming
Tang, Xiaoou
[J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2016, 38 (02) : 295 - 307
[9] Kolesnikov A., 2021, INT C LEARNING REPRE
[10] Li Y., 2021, DOCUMENT JVET V0100

← 1 2 3 →