Semantics-to-Signal Scalable Image Compression with Learned Revertible Representations

被引：35

作者：

Liu, Kang ^{[1
]}

Liu, Dong ^{[1
]}

Li, Li ^{[1
]}

Yan, Ning ^{[1
]}

Li, Houqiang ^{[1
]}

机构：

[1] Univ Sci & Technol China, CAS Key Lab Technol Geospatial Informat Proc & Ap, Hefei 230027, Peoples R China

来源：

INTERNATIONAL JOURNAL OF COMPUTER VISION | 2021年 / 129卷 / 09期

关键词：

Deep learning; Image compression; Lifting structure; Machine vision; Scalable coding; DECOMPOSITION SCHEMES; LIFTING SCHEME; MODEL;

D O I：

10.1007/s11263-021-01491-7

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Image/video compression and communication need to serve both human vision and machine vision. To address this need, we propose a scalable image compression solution. We assume that machine vision needs less information that is related to semantics, whereas human vision needs more information that is to reconstruct signal. We then propose semantics-to-signal scalable compression, where partial bitstream is decodeable for machine vision and the entire bitstream is decodeable for human vision. Our method is inspired by the scalable image coding standard, JPEG2000, and similarly adopts subband-wise representations. We first design a trainable and revertible transform based on the lifting structure, which converts an image into a pyramid of multiple subbands; the transform is trained to make the partial representations useful for multiple machine vision tasks. We then design an end-to-end optimized encoding/decoding network for compressing the multiple subbands, to jointly optimize compression ratio, semantic analysis accuracy, and signal reconstruction quality. We experiment with two datasets: CUB200-2011 and FGVC-Aircraft, taking coarse-to-fine image classification tasks as an example. Experimental results demonstrate that our proposed method achieves semantics-to-signal scalable compression, and outperforms JPEG2000 in compression efficiency. The proposed method sheds light on a generic approach for image/video coding for human and machines.

引用

页码：2605 / 2621

页数：17

共 52 条

[1] ON-SIGNAL DECOMPOSITION TECHNIQUES
AKANSU, AN
LIU, YP
[J]. OPTICAL ENGINEERING, 1991, 30 (07) : 912 - 920
[2] Akansu AN., 2001, MULTIRESOLUTION SIGN
[3] [Anonymous], 2018, ARXIV180207088
[4] [Anonymous], 2018, BRIDGING SEMANTIC GA, DOI DOI 10.1007/978-3-319-73891-8
[5] Ball Johannes, 2016, 5 INT C LEARNING REP
[6] Balle J., 2018, P INT C LEARN REPR, P23
[7] Rodriguez MXB, 2020, IEEE WINT CONF APPL, P3100, DOI [10.1109/WACV45572.2020.9093580, 10.1109/wacv45572.2020.9093580]
[8] A Bayesian information theoretic model of learning to learn via multiple task sampling
Baxter, J
[J]. MACHINE LEARNING, 1997, 28 (01) : 7 - 39
[9] Belongie, 2011, CNS T 2011 001
[10] Bialek William, 2000, The information bottleneck method

← 1 2 3 4 5 6 →