Accurate entropy modeling in learned image compression with joint enchanced SwinT and CNN

被引:0
作者
Yang, Dongjian [1 ,2 ]
Fan, Xiaopeng [2 ,3 ]
Meng, Xiandong [2 ]
Zhao, Debin [2 ,3 ]
机构
[1] Harbin Inst Technol, Dept Comp Sci & Technol, Taoyuan St, Shenzhen 518055, Guangdong, Peoples R China
[2] PengCheng Lab, Smart Coding Inst, Xingke 1st St, Shenzhen 518055, Guangdong, Peoples R China
[3] Harbin Inst Technol, Dept Comp Sci & Technol, Xidazhi St, Harbin 150001, Heilongjiang, Peoples R China
基金
中国国家自然科学基金;
关键词
Learned image compression; Swin transformer; Convolutional neural network; Entropy model;
D O I
10.1007/s00530-024-01405-w
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Recently, learned image compression (LIC) has shown significant research potential. Most existing LIC methods are CNN-based or transformer-based or mixed. However, these LIC methods suffer from a certain degree of degradation in global attention performance, as CNN has limited-sized convolution kernels while window partitioning is applied to reduce computational complexity in transformer. This gives rise to the following two issues: (1) The main autoencoder (AE) and hyper AE exhibit limited transformation capabilities due to insufficient global modeling, making it challenging to improve the accuracy of coarse-grained entropy model. (2) The fine-grained entropy model struggles to adaptively utilize a larger range of contexts, because of weaker global modeling capability. In this paper, we propose the LIC with joint enhanced swin transformer (SwinT) and CNN to improve the entropy modeling accuracy. The key in the proposed method is that we enhance the global modeling ability of SwinT by introducing neighborhood window attention while maintaining an acceptable computational complexity and combines the local modeling ability of CNN to form the enhanced SwinT and CNN block (ESTCB). Specifically, we reconstruct the main AE and hyper AE of LIC based on ESTCB, enhancing their global transformation capabilities and resulting in a more accurate coarse-grained entropy model. Besides, we combine ESTCB with the checkerboard mask and the channel autoregressive model to develop a spatial then channel fine-grained entropy model, expanding the scope of LIC adaptive reference contexts. Comprehensive experiments demonstrate that our proposed method achieves state-of-the-art rate-distortion performance compared to existing LIC models.
引用
收藏
页数:15
相关论文
共 56 条
  • [1] Agustsson E, 2017, ADV NEUR IN, V30
  • [2] Balle J., 2018, INT C LEARNING REPRE
  • [3] Balle J., 2017, INT C LEARNING REPRE
  • [4] Balle J., 2016, 5 INT C LEARNING REP
  • [5] Nonlinear Transform Coding
    Balle, Johannes
    Chou, Philip A.
    Minnen, David
    Singh, Saurabh
    Johnston, Nick
    Agustsson, Eirikur
    Hwang, Sung Jin
    Toderici, George
    [J]. IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2021, 15 (02) : 339 - 353
  • [6] End-to-end optimization of nonlinear transform codes for perceptual quality
    Balle, Johannes
    Laparra, Valero
    Simoncelli, Eero P.
    [J]. 2016 PICTURE CODING SYMPOSIUM (PCS), 2016,
  • [7] A Survey on Evolutionary Computation for Computer Vision and Image Analysis: Past, Present, and Future Trends
    Bi, Ying
    Xue, Bing
    Mesejo, Pablo
    Cagnoni, Stefano
    Zhang, Mengjie
    [J]. IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, 2023, 27 (01) : 5 - 25
  • [8] Overview of the Versatile Video Coding (VVC) Standard and its Applications
    Bross, Benjamin
    Wang, Ye-Kui
    Ye, Yan
    Liu, Shan
    Chen, Jianle
    Sullivan, Gary J.
    Ohm, Jens-Rainer
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2021, 31 (10) : 3736 - 3764
  • [9] End-to-end optimized image compression with competition of prior distributions
    Brummer, Benoit
    De Vleeschouwer, Christophe
    [J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2021, 2021, : 1890 - 1894
  • [10] Chen FD, 2022, AAAI CONF ARTIF INTE, P3922