Rate-Distortion Optimized Encoding for Deep Image Compression

被引:4
作者
Schafer, Michael [1 ]
Pientka, Sophie [1 ]
Pfaff, Jonathan [1 ,2 ]
Schwarz, Heiko [1 ,3 ]
Marpe, Detlev [1 ]
Wiegand, Thomas [1 ,2 ,4 ]
机构
[1] Heinrich Hertz Inst Nachrichtentech Berlin GmbH, Fraunhofer Inst Telecommun, Video Commun & Applicat Dept, D-10587 Berlin, Germany
[2] Heinrich Hertz Inst Nachrichtentech Berlin GmbH, Fraunhofer Inst Telecommun, D-10587 Berlin, Germany
[3] Free Univ Berlin, Dept Math & Comp Sci, D-14195 Berlin, Germany
[4] Berlin Inst Technol, Dept Elect Engn & Comp Sci, D-10623 Berlin, Germany
来源
IEEE OPEN JOURNAL OF CIRCUITS AND SYSTEMS | 2021年 / 2卷
关键词
Video coding; Image coding; Vector quantization; Nonlinear distortion; Bit rate; Rate-distortion; Signal processing algorithms; Deep image compression; variational auto-encoders; rate-distortion optimized encoding; non-linear transform coding; VIDEO; EFFICIENCY;
D O I
10.1109/OJCAS.2021.3124995
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Deep-learned variational auto-encoders (VAE) have shown remarkable capabilities for lossy image compression. These neural networks typically employ non-linear convolutional layers for finding a compressible representation of the input image. Advanced techniques such as vector quantization, context-adaptive arithmetic coding and variable-rate compression have been implemented in these auto-encoders. Notably, these networks rely on an end-to-end approach, which fundamentally differs from hybrid, block-based video coding systems. Therefore, signal-dependent encoder optimizations have not been thoroughly investigated for VAEs yet. However, rate-distortion optimized encoding heavily determines the compression performance of state-of-the-art video codecs. Designing such optimizations for non-linear, multi-layered networks requires to understand the relationship between the quantization, the bit allocation of the features and the distortion. Therefore, this paper examines the rate-distortion performance of a variable-rate VAE. In particular, one demonstrates that the trained encoder network typically finds features with a near-optimal bit allocation across the channels. Furthermore, one approximates the relationship between distortion and quantization by a higher-order polynomial, whose coefficients can be robustly estimated. Based on these considerations, the authors investigate an encoding algorithm for the Lagrange optimization, which significantly improves the coding efficiency.
引用
收藏
页码:633 / 647
页数:15
相关论文
共 39 条
[1]  
Agustsson E, 2017, ADV NEUR IN, V30
[2]  
Akbari M., 2002, ARXIV200210032, V2020
[3]  
[Anonymous], 2013, ITU-T Rec. H.265
[4]  
[Anonymous], 2020, VERS VID COD
[5]  
[Anonymous], 2021, KODAK IMAGE DATASET
[6]   Nonlinear Transform Coding [J].
Balle, Johannes ;
Chou, Philip A. ;
Minnen, David ;
Singh, Saurabh ;
Johnston, Nick ;
Agustsson, Eirikur ;
Hwang, Sung Jin ;
Toderici, George .
IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2021, 15 (02) :339-353
[7]   End-to-end optimization of nonlinear transform codes for perceptual quality [J].
Balle, Johannes ;
Laparra, Valero ;
Simoncelli, Eero P. .
2016 PICTURE CODING SYMPOSIUM (PCS), 2016,
[8]  
Balle Johannes, 2016, P INT C LEARN REPR
[9]  
Balle Johannes, 2018, arXiv preprint arXiv:1802.01436
[10]  
Balle Johannes, 2017, INT C LEARN REPR