Slimmable Multi-Task Image Compression for Human and Machine Vision

被引:6
作者
Cao, Jiangzhong [1 ]
Yao, Ximei [1 ]
Zhang, Huan [1 ]
Jin, Jian [2 ]
Zhang, Yun [3 ]
Ling, Bingo Wing-Kuen [1 ]
机构
[1] Guangdong Univ Technol, Sch Informat Engn, Guangzhou 510006, Peoples R China
[2] Nanyang Technol Univ, Alibaba NTU Singapore Joint Res Inst, Singapore 639798, Singapore
[3] Sun Yat Sen Univ, Sch Elect & Commun Engn, Shenzhen 518107, Peoples R China
基金
中国国家自然科学基金;
关键词
Image coding; Task analysis; Machine vision; Multitasking; Image reconstruction; Internet of Things; Streaming media; Image compression; feature compression; collaborative compression; intelligent analytics; machine vision;
D O I
10.1109/ACCESS.2023.3261668
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In the Internet of Things (IoT) communications, visual data are frequently processed among intelligent devices using artificial intelligence algorithms, replacing humans for analysis and decision-making while only occasionally requiring human scrutiny. However, due to high redundancy of compressive encoders, existing image coding solutions for machine vision are inefficient at runtime. To balance the rate-accuracy performance and efficiency of image compression for machine vision while attaining high-quality reconstructed images for human vision, this paper introduces a novel slimmable multi-task compression framework for human and machine vision in visual IoT applications. Firstly, image compression for human and machine vision under the constraint of bandwidth, latency, and computational resources is modeled as a multi-task optimization problem. Secondly, slimmable encoders are employed for multiple human and machine vision tasks in which the parameters of the sub-encoder for machine vision tasks are shared among all tasks and jointly learned. Thirdly, to solve the feature match between latent representation and intermediate features of deep vision networks, feature transformation networks are introduced as decoders of machine vision feature compression. Finally, the proposed framework is successfully applied to human and machine vision tasks' scenarios, e.g., object detection and image reconstruction. Experimental results show that the proposed method outperforms baselines and other image compression approaches on machine vision tasks with higher efficiency (shorter latency) in two vision tasks' scenarios while retaining comparable quality on image reconstruction.
引用
收藏
页码:29946 / 29958
页数:13
相关论文
共 37 条
[1]  
Alvar SR, 2019, IEEE IMAGE PROC, P1705, DOI [10.1109/icip.2019.8803110, 10.1109/ICIP.2019.8803110]
[2]   COLLABORATIVE INTELLIGENCE: CHALLENGES AND OPPORTUNITIES [J].
Bajic, Ivan, V ;
Lin, Weisi ;
Tian, Yonghong .
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, :8493-8497
[3]  
Ball‚ J, 2018, Arxiv, DOI [arXiv:1802.01436, 10.48550/arXiv.1802.01436]
[4]  
Begaint J, 2020, arXiv
[5]   End-to-end optimized image compression for machines, a study [J].
Chamain, Lahiru D. ;
Racape, Fabien ;
Begaint, Jean ;
Pushparaja, Akshay ;
Feltman, Simon .
2021 DATA COMPRESSION CONFERENCE (DCC 2021), 2021, :163-172
[6]   Toward Intelligent Sensing: Intermediate Deep Feature Compression [J].
Chen, Zhuo ;
Fan, Kui ;
Wang, Shiqi ;
Duan, Lingyu ;
Lin, Weisi ;
Kot, Alex Chichung .
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2020, 29 :2230-2243
[7]  
Cheng ZX, 2020, PROC CVPR IEEE, P7936, DOI 10.1109/CVPR42600.2020.00796
[8]   LATENT-SPACE SCALABILITY FOR MULTI-TASK COLLABORATIVE INTELLIGENCE [J].
Choi, Hyomin ;
Bajic, Ivan, V .
2021 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2021, :3562-3566
[9]   Scalable Image Coding for Humans and Machines [J].
Choi, Hyomin ;
Bajic, Ivan, V .
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 :2739-2754
[10]  
Choi H, 2020, INT CONF ACOUST SPEE, P4467, DOI [10.1109/ICASSP40776.2020.9053011, 10.1109/icassp40776.2020.9053011]