One-for-All: An Efficient Variable Convolution Neural Network for In-Loop Filter of VVC

被引:27
作者
Huang, Zhijie [1 ]
Sun, Jun [1 ]
Guo, Xiaopeng [1 ]
Shang, Mingyu [1 ]
机构
[1] Peking Univ, Wangxuan Inst Comp Technol, Beijing 100871, Peoples R China
关键词
Encoding; Videos; Feature extraction; Convolution; Adaptation models; Visualization; Training; Variable; in-loop filter; attention; versatile video coding (VVC); CNN;
D O I
10.1109/TCSVT.2021.3089498
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Recently, many researches on convolution neural network (CNN) based in-loop filters have been proposed to improve coding efficiency. However, most existing CNN based filters tend to train and deploy multiple networks for various quantization parameters (QP) and frame types (FT), which drastically increases resources in training these models and the memory burdens for video codec. In this paper, we propose a novel variable CNN (VCNN) based in-loop filter for VVC, which can effectively handle the compressed videos with different QPs and FTs via a single model. Specifically, an efficient and flexible attention module is developed to recalibrate features according to QPs or FTs. Then we embed the module into the residual block so that these informative features can be continuously utilized in the residual learning process. To minimize the information loss in the learning process of the entire network, we utilize a residual feature aggregation module (RFA) for more efficient feature extraction. Based on it, an efficient network architecture VCNN is designed that can not only effectively reduce compression artifacts, but also can be adaptive to various QPs and FTs. To address training data imbalance on various QPs and FTs and improve the robustness of the model, a focal mean square error loss function is employed to train the proposed network. Then we integrate the VCNN into VVC as an additional tool of in-loop filters after the deblocking filter. Extensive experimental results show that our VCNN approach obtains on average 3.63%, 4.36%, 4.23%, 3.56% under all intra, low-delay P, low-delay, and random access configurations, respectively, which is even better than QP-Separate models.
引用
收藏
页码:2342 / 2355
页数:14
相关论文
共 55 条
  • [1] NTIRE 2017 Challenge on Single Image Super-Resolution: Dataset and Study
    Agustsson, Eirikur
    Timofte, Radu
    [J]. 2017 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW), 2017, : 1122 - 1131
  • [2] [Anonymous], 2015, IEEE I CONF COMP VIS, DOI DOI 10.1109/ICCV.2015.123
  • [3] [Anonymous], 2019, DOCUMENT JVET M0510
  • [4] Real Image Denoising with Feature Attention
    Anwar, Saeed
    Barnes, Nick
    [J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 3155 - 3164
  • [5] Bjontegaard G., 2001, ITU T Q6SG16 VCEG M3
  • [6] Bossen, 2020, TEL 19 M
  • [7] Bossen Frank., 2011, Joint Collaborative Team on Video Coding (JCT-VC), JCTVC-F900
  • [8] In-Loop Filter with Dense Residual Convolutional Neural Network for VVC
    Chen, Sijia
    Chen, Zhenzhong
    Wang, Yingbin
    Liu, Shan
    [J]. THIRD INTERNATIONAL CONFERENCE ON MULTIMEDIA INFORMATION PROCESSING AND RETRIEVAL (MIPR 2020), 2020, : 153 - 156
  • [9] A Convolutional Neural Network Approach for Post-Processing in HEVC Intra Coding
    Dai, Yuanying
    Liu, Dong
    Wu, Feng
    [J]. MULTIMEDIA MODELING (MMM 2017), PT I, 2017, 10132 : 28 - 39
  • [10] A Switchable Deep Learning Approach for In-Loop Filtering in Video Coding
    Ding, Dandan
    Kong, Lingyi
    Chen, Guangyao
    Liu, Zoe
    Fang, Yong
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2020, 30 (07) : 1871 - 1887