Scalable Image Coding for Humans and Machines

被引:55
作者
Choi, Hyomin [1 ]
Bajic, Ivan, V [1 ]
机构
[1] Simon Fraser Univ, Sch Engn Sci, Burnaby, BC V5A 1S6, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
Task analysis; Image reconstruction; Image coding; Scalability; Object detection; Multitasking; Transforms; Image compression; deep neural network; multitask network; scalable coding; latent-space scalability; BIT ALLOCATION;
D O I
10.1109/TIP.2022.3160602
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
At present, and increasingly so in the future, much of the captured visual content will not be seen by humans. Instead, it will be used for automated machine vision analytics and may require occasional human viewing. Examples of such applications include traffic monitoring, visual surveillance, autonomous navigation, and industrial machine vision. To address such requirements, we develop an end-to-end learned image codec whose latent space is designed to support scalability from simpler to more complicated tasks. The simplest task is assigned to a subset of the latent space (the base layer), while more complicated tasks make use of additional subsets of the latent space, i.e., both the base and enhancement layer(s). For the experiments, we establish a 2-layer and a 3-layer model, each of which offers input reconstruction for human vision, plus machine vision task(s), and compare them with relevant benchmarks. The experiments show that our scalable codecs offer 37%-80% bitrate savings on machine vision tasks compared to best alternatives, while being comparable to state-of-the-art image codecs in terms of input reconstruction.
引用
收藏
页码:2739 / 2754
页数:16
相关论文
共 63 条
  • [1] Pareto-Optimal Bit Allocation for Collaborative Intelligence
    Alvar, Saeed Ranjbar
    Bajic, Ivan V.
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2021, 30 : 3348 - 3361
  • [2] Alvar SR, 2020, INT CONF ACOUST SPEE, P4342, DOI [10.1109/icassp40776.2020.9054770, 10.1109/ICASSP40776.2020.9054770]
  • [3] Alvar SR, 2019, IEEE IMAGE PROC, P1705, DOI [10.1109/icip.2019.8803110, 10.1109/ICIP.2019.8803110]
  • [4] Alvar SR, 2018, 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), P1288, DOI 10.1109/ICASSP.2018.8462654
  • [5] Can you tell a face from a HEVC bitstream?
    Alvar, Saeed Ranjbar
    Choi, Hyomin
    Bajic, Ivan V.
    [J]. IEEE 1ST CONFERENCE ON MULTIMEDIA INFORMATION PROCESSING AND RETRIEVAL (MIPR 2018), 2018, : 257 - 261
  • [6] [Anonymous], 2020, ISO/IEC JTC 1/SC 29/WG 2
  • [7] [Anonymous], 2020, H266 ITUT
  • [8] [Anonymous], 2015, ITU-R BT.709-6
  • [9] [Anonymous], 2015, 1593813 ISOIEC
  • [10] [Anonymous], 2019, High efficiency video coding, ITU-T Recommendation H.265, DOI DOI 11.1002/1000/14107