Multi-Stage Spatial and Frequency Feature Fusion using Transformer in CNN-Based In-Loop Filter for VVC

被引:4
作者
Kathariya, Birendra [1 ]
Li, Zhu [1 ]
Wang, Hongtao [2 ]
Coban, Mohammad [2 ]
机构
[1] Univ Missouri, Kansas City, MO 64110 USA
[2] Qualcomm Technol Inc, San Diego, CA USA
来源
2022 PICTURE CODING SYMPOSIUM (PCS) | 2022年
关键词
Versatile Video Coding (VVC); In-Loop Filter; Discrete Cosine Transform (DCT); Convolutional Neural Network; Transformer;
D O I
10.1109/PCS56426.2022.10017998
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Versatile Video Coding (VVC)/H.266 is a video coding successor to High Efficiency Video Coding (HEVC)/H.255 and Advanced Video Coding (AVC)/H.264 with significant technical and coding improvement. Nonetheless, it follows the conventional block-based hybrid video coding scheme similar to its predecessors. The consequence is, that the reconstructed picture contains compression artifacts. VVC, by default, has in-loop filters to correct the deformities but these handcrafted filters offer suboptimal performance. In this work, we designed a novel convolutional neural network (CNN) to replace the inbuilt in-loop filter of VVC. The proposed CNN-based in-loop filter utilizes a modified Spectral-wise Multi-Head Self-Attention (S-MSA) layer of Multi-stage Spectral-wise Transformer (MST++) at multiple stages to fuse spatial and frequency-decomposed features extracted from pixel and its discrete-cosine-transform (DCT) applied input respectively. We named the proposed network MSTFNet where the first three letters represent MST++ and F stands for fusion. Because of the multi-stage feature fusion operation, the proposed CNN acts as a powerful learned in-loop filter that significantly outperforms previous methods. Our experimental results show that the proposed method can achieve coding improvements up to 10.31% on average Bjontegaard Delta (BD)-Bitrate savings under all-intra (AI) configurations for the luma (Y) component.
引用
收藏
页码:373 / 377
页数:5
相关论文
共 24 条
  • [1] [Anonymous], 2018, P IEEE C COMP VIS PA
  • [2] Blanch MG, 2020, IEEE IMAGE PROC, P783, DOI [10.1109/ICIP40778.2020.9191050, 10.1109/icip40778.2020.9191050]
  • [3] Overview of the Versatile Video Coding (VVC) Standard and its Applications
    Bross, Benjamin
    Wang, Ye-Kui
    Ye, Yan
    Liu, Shan
    Chen, Jianle
    Sullivan, Gary J.
    Ohm, Jens-Rainer
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2021, 31 (10) : 3736 - 3764
  • [4] MST plus plus : Multi-stage Spectral-wise Transformer for Efficient Spectral Reconstruction
    Cai, Yuanhao
    Lin, Jing
    Lin, Zudi
    Wang, Haoqian
    Zhang, Yulun
    Pfister, Hanspeter
    Timofte, Radu
    Van Gool, Luc
    [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2022, 2022, : 744 - 754
  • [5] Dai Y., 2019, DOCUMENT JVET M0510
  • [6] A Convolutional Neural Network Approach for Post-Processing in HEVC Intra Coding
    Dai, Yuanying
    Liu, Dong
    Wu, Feng
    [J]. MULTIMEDIA MODELING (MMM 2017), PT I, 2017, 10132 : 28 - 39
  • [7] Compression Artifacts Reduction by a Deep Convolutional Network
    Dong, Chao
    Deng, Yubin
    Loy, Chen Change
    Tang, Xiaoou
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, : 576 - 584
  • [8] Image Super-Resolution Using Deep Convolutional Networks
    Dong, Chao
    Loy, Chen Change
    He, Kaiming
    Tang, Xiaoou
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2016, 38 (02) : 295 - 307
  • [9] Kolesnikov A., 2021, INT C LEARNING REPRE
  • [10] Li Y., 2021, DOCUMENT JVET V0100