Scalable Image Coding for Humans and Machines

被引：85

作者：

Choi, Hyomin ^{[1
]}

Bajic, Ivan, V ^{[1
]}

机构：

[1] Simon Fraser Univ, Sch Engn Sci, Burnaby, BC V5A 1S6, Canada

来源：

IEEE TRANSACTIONS ON IMAGE PROCESSING | 2022年 / 31卷

基金：

加拿大自然科学与工程研究理事会;

关键词：

Task analysis; Image reconstruction; Image coding; Scalability; Object detection; Multitasking; Transforms; Image compression; deep neural network; multitask network; scalable coding; latent-space scalability; BIT ALLOCATION;

D O I：

10.1109/TIP.2022.3160602

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

At present, and increasingly so in the future, much of the captured visual content will not be seen by humans. Instead, it will be used for automated machine vision analytics and may require occasional human viewing. Examples of such applications include traffic monitoring, visual surveillance, autonomous navigation, and industrial machine vision. To address such requirements, we develop an end-to-end learned image codec whose latent space is designed to support scalability from simpler to more complicated tasks. The simplest task is assigned to a subset of the latent space (the base layer), while more complicated tasks make use of additional subsets of the latent space, i.e., both the base and enhancement layer(s). For the experiments, we establish a 2-layer and a 3-layer model, each of which offers input reconstruction for human vision, plus machine vision task(s), and compare them with relevant benchmarks. The experiments show that our scalable codecs offer 37%-80% bitrate savings on machine vision tasks compared to best alternatives, while being comparable to state-of-the-art image codecs in terms of input reconstruction.

引用

页码：2739 / 2754

页数：16

共 63 条

[1] Pareto-Optimal Bit Allocation for Collaborative Intelligence [J].

Alvar, Saeed Ranjbar ;

Bajic, Ivan V. .

IEEE TRANSACTIONS ON IMAGE PROCESSING, 2021, 30 :3348-3361

[2]

Alvar SR, 2020, INT CONF ACOUST SPEE, P4342, DOI [10.1109/ICASSP40776.2020.9054770, 10.1109/icassp40776.2020.9054770]

[3]

Alvar SR, 2019, IEEE IMAGE PROC, P1705, DOI [10.1109/ICIP.2019.8803110, 10.1109/icip.2019.8803110]

[4]

Alvar SR, 2018, 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), P1288, DOI 10.1109/ICASSP.2018.8462654

[5] Can you tell a face from a HEVC bitstream? [J].

Alvar, Saeed Ranjbar ;

Choi, Hyomin ;

Bajic, Ivan V. .

IEEE 1ST CONFERENCE ON MULTIMEDIA INFORMATION PROCESSING AND RETRIEVAL (MIPR 2018), 2018, :257-261

[6]

[Anonymous], 2020, ISO/IEC JTC 1/SC 29/WG 2

[7]

[Anonymous], 2020, H266 ITUT

[8]

[Anonymous], 2015, ARXIV151106281

[9]

[Anonymous], 2015, ITU-R BT.709-6

[10]

[Anonymous], 2015, 1593813 ISOIEC

← 1 2 3 4 5 6 7 →