Learned Image Compression with Discretized Gaussian Mixture Likelihoods and Attention Modules

被引：659

作者：

Cheng, Zhengxue ^{[1
]}

Sun, Heming ^{[2
,3
]}

Takeuchi, Masaru ^{[2
]}

Katto, Jiro ^{[1
]}

机构：

[1] Waseda Univ, Dept Comp Sci & Commun Engn, Tokyo, Japan

[2] Waseda Res Inst Sci & Engn, Tokyo, Japan

[3] JST, PRESTO, 4-1-8 Honcho, Kawaguchi, Saitama, Japan

来源：

2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020) | 2020年

基金：

日本学术振兴会;

关键词：

D O I：

10.1109/CVPR42600.2020.00796

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Image compression is a fundamental research field and many well-known compression standards have been developed for many decades. Recently, learned compression methods exhibit a fast development trend with promising results. However, there is still a performance gap between learned compression algorithms and reigning compression standards, especially in terms of widely used PSNR metric. In this paper, we explore the remaining redundancy of recent learned compression algorithms. We have found accurate entropy models for rate estimation largely affect the optimization of network parameters and thus affect the rate-distortion performance. Therefore, in this paper, we propose to use discretized Gaussian Mixture Likelihoods to parameterize the distributions of latent codes, which can achieve a more accurate and flexible entropy model. Besides, we take advantage of recent attention modules and incorporate them into network architecture to enhance the performance. Experimental results demonstrate our proposed method achieves a state-of-the-art performance compared to existing learned compression methods on both Kodak and high-resolution datasets. To our knowledge our approach is the first work to achieve comparable performance with latest compression standard Versatile Video Coding (VVC) regarding PSNR. More importantly, our approach generates more visually pleasant results when optimized by MS-SSIM.

引用

页码：7936 / 7945

页数：10

共 36 条

[1]

Agustsson E, 2017, ADV NEUR IN, V30

[2]

[Anonymous], 2008, P 25 INT C MACH LEAR, DOI DOI 10.1145/1390156.1390294

[3]

Balle J., 2017, ICLR, P1

[4]

Balle Johannes, 2018, P INT C LEARN REPR

[5]

bellard, BPG Image Format

[6]

Cheng Z., 2018, CVPR WORKSH CHALL LE

[7] Learning Image and Video Compression through Spatial-Temporal Energy Compaction [J].

Cheng, Zhengxue ;

Sun, Heming ;

Takeuchi, Masaru ;

Katto, Jiro .

2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :10063-10072

[8]

Cheng ZX, 2018, PICT COD SYMP, P253, DOI 10.1109/PCS.2018.8456308

[9]

Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848

[10]

Eirikur Agustsson M., arXiv, DOI [DOI 10.48550/ARXIV.1804.02958, 10.48550/arXiv.1804.02958]

← 1 2 3 4 →