Accurate entropy modeling in learned image compression with joint enchanced SwinT and CNN

被引：0

作者：

Yang, Dongjian ^{[1
,2
]}

Fan, Xiaopeng ^{[2
,3
]}

Meng, Xiandong ^{[2
]}

Zhao, Debin ^{[2
,3
]}

机构：

[1] Harbin Inst Technol, Dept Comp Sci & Technol, Taoyuan St, Shenzhen 518055, Guangdong, Peoples R China

[2] PengCheng Lab, Smart Coding Inst, Xingke 1st St, Shenzhen 518055, Guangdong, Peoples R China

[3] Harbin Inst Technol, Dept Comp Sci & Technol, Xidazhi St, Harbin 150001, Heilongjiang, Peoples R China

来源：

MULTIMEDIA SYSTEMS | 2024年 / 30卷 / 04期

基金：

中国国家自然科学基金;

关键词：

Learned image compression; Swin transformer; Convolutional neural network; Entropy model;

D O I：

10.1007/s00530-024-01405-w

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Recently, learned image compression (LIC) has shown significant research potential. Most existing LIC methods are CNN-based or transformer-based or mixed. However, these LIC methods suffer from a certain degree of degradation in global attention performance, as CNN has limited-sized convolution kernels while window partitioning is applied to reduce computational complexity in transformer. This gives rise to the following two issues: (1) The main autoencoder (AE) and hyper AE exhibit limited transformation capabilities due to insufficient global modeling, making it challenging to improve the accuracy of coarse-grained entropy model. (2) The fine-grained entropy model struggles to adaptively utilize a larger range of contexts, because of weaker global modeling capability. In this paper, we propose the LIC with joint enhanced swin transformer (SwinT) and CNN to improve the entropy modeling accuracy. The key in the proposed method is that we enhance the global modeling ability of SwinT by introducing neighborhood window attention while maintaining an acceptable computational complexity and combines the local modeling ability of CNN to form the enhanced SwinT and CNN block (ESTCB). Specifically, we reconstruct the main AE and hyper AE of LIC based on ESTCB, enhancing their global transformation capabilities and resulting in a more accurate coarse-grained entropy model. Besides, we combine ESTCB with the checkerboard mask and the channel autoregressive model to develop a spatial then channel fine-grained entropy model, expanding the scope of LIC adaptive reference contexts. Comprehensive experiments demonstrate that our proposed method achieves state-of-the-art rate-distortion performance compared to existing LIC models.

引用

页数：15

共 56 条

[1] Agustsson E, 2017, ADV NEUR IN, V30
[2] Balle J., 2018, INT C LEARNING REPRE
[3] Balle J., 2017, INT C LEARNING REPRE
[4] Balle J., 2016, 5 INT C LEARNING REP
[5] Nonlinear Transform Coding
Balle, Johannes
Chou, Philip A.
Minnen, David
Singh, Saurabh
Johnston, Nick
Agustsson, Eirikur
Hwang, Sung Jin
Toderici, George
[J]. IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2021, 15 (02) : 339 - 353
[6] End-to-end optimization of nonlinear transform codes for perceptual quality
Balle, Johannes
Laparra, Valero
Simoncelli, Eero P.
[J]. 2016 PICTURE CODING SYMPOSIUM (PCS), 2016,
[7] A Survey on Evolutionary Computation for Computer Vision and Image Analysis: Past, Present, and Future Trends
Bi, Ying
Xue, Bing
Mesejo, Pablo
Cagnoni, Stefano
Zhang, Mengjie
[J]. IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, 2023, 27 (01) : 5 - 25
[8] Overview of the Versatile Video Coding (VVC) Standard and its Applications
Bross, Benjamin
Wang, Ye-Kui
Ye, Yan
Liu, Shan
Chen, Jianle
Sullivan, Gary J.
Ohm, Jens-Rainer
[J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2021, 31 (10) : 3736 - 3764
[9] End-to-end optimized image compression with competition of prior distributions
Brummer, Benoit
De Vleeschouwer, Christophe
[J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2021, 2021, : 1890 - 1894
[10] Chen FD, 2022, AAAI CONF ARTIF INTE, P3922

← 1 2 3 4 5 6 →