Semantics-to-Signal Scalable Image Compression with Learned Revertible Representations

被引：35

作者：

Liu, Kang ^{[1
]}

Liu, Dong ^{[1
]}

Li, Li ^{[1
]}

Yan, Ning ^{[1
]}

Li, Houqiang ^{[1
]}

机构：

[1] Univ Sci & Technol China, CAS Key Lab Technol Geospatial Informat Proc & Ap, Hefei 230027, Peoples R China

来源：

INTERNATIONAL JOURNAL OF COMPUTER VISION | 2021年 / 129卷 / 09期

关键词：

Deep learning; Image compression; Lifting structure; Machine vision; Scalable coding; DECOMPOSITION SCHEMES; LIFTING SCHEME; MODEL;

D O I：

10.1007/s11263-021-01491-7

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Image/video compression and communication need to serve both human vision and machine vision. To address this need, we propose a scalable image compression solution. We assume that machine vision needs less information that is related to semantics, whereas human vision needs more information that is to reconstruct signal. We then propose semantics-to-signal scalable compression, where partial bitstream is decodeable for machine vision and the entire bitstream is decodeable for human vision. Our method is inspired by the scalable image coding standard, JPEG2000, and similarly adopts subband-wise representations. We first design a trainable and revertible transform based on the lifting structure, which converts an image into a pyramid of multiple subbands; the transform is trained to make the partial representations useful for multiple machine vision tasks. We then design an end-to-end optimized encoding/decoding network for compressing the multiple subbands, to jointly optimize compression ratio, semantic analysis accuracy, and signal reconstruction quality. We experiment with two datasets: CUB200-2011 and FGVC-Aircraft, taking coarse-to-fine image classification tasks as an example. Experimental results demonstrate that our proposed method achieves semantics-to-signal scalable compression, and outperforms JPEG2000 in compression efficiency. The proposed method sheds light on a generic approach for image/video coding for human and machines.

引用

页码：2605 / 2621

页数：17

共 52 条

[11] Blaschko M, 2013, Fine -Grained Visual Classification of Aircraft
[12] Boski M, 2017, 2017 10TH INTERNATIONAL WORKSHOP ON MULTIDIMENSIONAL (ND) SYSTEMS (NDS)
[13] Cao X., 2019, ARXIV191006244
[14] The JPEG2000 still image coding system: An overview
Christopoulos, C
Skodras, A
Ebrahimi, T
[J]. IEEE TRANSACTIONS ON CONSUMER ELECTRONICS, 2000, 46 (04) : 1103 - 1127
[15] Dejean-Servieres M., 2017, STUDY IMPACT STANDAR
[16] Dodge S, 2016, 2016 EIGHTH INTERNATIONAL CONFERENCE ON QUALITY OF MULTIMEDIA EXPERIENCE (QOMEX)
[17] Video Coding for Machines: A Paradigm of Collaborative Compression and Intelligent Analytics
Duan, Lingyu
Liu, Jiaying
Yang, Wenhan
Huang, Tiejun
Gao, Wen
[J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2020, 29 : 8680 - 8695
[18] Gomez AN, 2017, ADV NEUR IN, V30
[19] Nonlinear multiresolution signal decomposition schemes - Part I: Morphological pyramids
Goutsias, J
Heijmans, HJAM
[J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2000, 9 (11) : 1862 - 1876
[20] Lifting Scheme-Based Deep Neural Network for Remote Sensing Scene Classification
He, Chu
Shi, Zishan
Qu, Tao
Wang, Dingwen
Liao, Mingsheng
[J]. REMOTE SENSING, 2019, 11 (22)

← 1 2 3 4 5 6 →