Human-Machine Collaborative Image Compression Method Based on Implicit Neural Representations

被引:1
作者
Li, Huanyang [1 ]
Zhang, Xinfeng [1 ]
机构
[1] Univ Chinese Acad Sci, Sch Comp Sci & Technol, Beijing 100049, Peoples R China
关键词
Image compression; image coding for machine; implicit neural representation;
D O I
10.1109/JETCAS.2024.3386639
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
With the explosive increase in the volume of images intended for analysis by AI, image coding for machine have been proposed to transmit information in a machine-interpretable format, thereby enhancing image compression efficiency. However, such efficient coding schemes often lead to issues like loss of image details and features, and unclear semantic information due to high data compression ratio, making them less suitable for human vision domains. Thus, it is a critical problem to balance image visual quality and machine vision accuracy at a given compression ratio. To address these issues, we introduce a human-machine collaborative image coding framework based on Implicit Neural Representations (INR), which effectively reduces the transmitted information for machine vision tasks at the decoding side while maintaining high-efficiency image compression for human vision against INR compression framework. To enhance the model's perception of images for machine vision, we design a semantic embedding enhancement module to assist in understanding image semantics. Specifically, we employ the Swin Transformer model to initialize image features, ensuring that the embedding of the compression model are effectively applicable to downstream visual tasks. Extensive experimental results demonstrate that our method significantly outperforms other image compression methods in classification tasks while ensuring image compression efficiency.
引用
收藏
页码:198 / 208
页数:11
相关论文
共 62 条
[1]   DISCRETE COSINE TRANSFORM [J].
AHMED, N ;
NATARAJAN, T ;
RAO, KR .
IEEE TRANSACTIONS ON COMPUTERS, 1974, C 23 (01) :90-93
[2]   Learned Multi-Resolution Variable-Rate Image Compression With Octave-Based Residual Blocks [J].
Akbari, Mohammad ;
Liang, Jie ;
Han, Jingning ;
Tu, Chengjie .
IEEE TRANSACTIONS ON MULTIMEDIA, 2021, 23 :3013-3021
[3]   A Hardware Architecture for Better Portable Graphics (BPG) Compression Encoder [J].
Albalawi, Umar ;
Mohanty, Saraju P. ;
Kougianos, Elias .
2015 IEEE INTERNATIONAL SYMPOSIUM ON NANOELECTRONIC AND INFORMATION SYSTEMS, 2015, :291-296
[4]  
Albawi S, 2017, I C ENG TECHNOL
[5]  
[Anonymous], 2013, Image Processing, Analysis and Machine Vision
[6]   The JPEG AI Standard: Providing Efficient Human and Machine Visual Data Consumption [J].
Ascenso, Joao ;
Alshina, Elena ;
Ebrahimi, Touradj .
IEEE MULTIMEDIA, 2023, 30 (01) :100-111
[7]  
Bai Y., 2019, PROC IEEE 90 VEH TEC, P1, DOI DOI 10.23919/oceans40490.2019.8962582
[8]  
Balle J., 2018, arXiv
[9]   An Overview of Coding Tools in AV1: the First Video Codec from the Alliance for Open Media [J].
Chen, Yue ;
Mukherjee, Debargha ;
Han, Jingning ;
Grange, Adrian ;
Xu, Yaowu ;
Parker, Sarah ;
Chen, Cheng ;
Su, Hui ;
Joshi, Urvang ;
Chiang, Ching-Han ;
Wang, Yunqing ;
Wilkins, Paul ;
Bankoski, Jim ;
Trudeau, Luc ;
Egge, Nathan ;
Valin, Jean-Marc ;
Davies, Thomas ;
Midtskogen, Steinar ;
Norkin, Andrey ;
de Rivaz, Peter ;
Liu, Zoe .
APSIPA TRANSACTIONS ON SIGNAL AND INFORMATION PROCESSING, 2020, 9
[10]   LATENT-SPACE SCALABILITY FOR MULTI-TASK COLLABORATIVE INTELLIGENCE [J].
Choi, Hyomin ;
Bajic, Ivan, V .
2021 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2021, :3562-3566