End-to-end optimized image compression with the frequency-oriented transform

被引:2
作者
Zhang, Yuefeng [1 ]
Lin, Kai [2 ]
机构
[1] Beijing Inst Comp Technol & Applicat, 51th Yongding Rd, Beijing 100039, Peoples R China
[2] Peking Univ, Sch Comp Sci, Beijing 100871, Peoples R China
关键词
Image compression; Image processing; Computer vision; Machine learning;
D O I
10.1007/s00138-023-01507-x
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Image compression constitutes a significant challenge amid the era of information explosion. Recent studies employing deep learning methods have demonstrated the superior performance of learning-based image compression methods over traditional codecs. However, an inherent challenge associated with these methods lies in their lack of interpretability. Following an analysis of the varying degrees of compression degradation across different frequency bands, we propose the end-to-end optimized image compression model facilitated by the frequency-oriented transform. The proposed end-to-end image compression model consists of four components: spatial sampling, frequency-oriented transform, entropy estimation, and frequency-aware fusion. The frequency-oriented transform separates the original image signal into distinct frequency bands, aligning with the human-interpretable concept. Leveraging the non-overlapping hypothesis, the model enables scalable coding through the selective transmission of arbitrary frequency components. Extensive experiments are conducted to demonstrate that our model outperforms all traditional codecs including next-generation standard H.266/VVC on MS-SSIM metric. Moreover, visual analysis tasks (i.e., object detection and semantic segmentation) are conducted to verify the proposed compression method that could preserve semantic fidelity besides signal-level precision.
引用
收藏
页数:14
相关论文
共 45 条
[1]  
Adelson E., 1983, RCA ENG, V29
[2]  
Akbari M., 2021, P ASS ADV ART INT AA
[3]   Image coding using wavelet transform [J].
Antonini, Marc ;
Barlaud, Michel ;
Mathieu, Pierre ;
Daubechies, Ingrid .
IEEE TRANSACTIONS ON IMAGE PROCESSING, 1992, 1 (02) :205-220
[4]  
Balle J., 2017, INT C LEARNING REPRE, P1
[5]  
Balle Johannes, 2018, P INT C LEARN REPR I
[6]  
Begaint Jean, 2020, arXiv
[7]  
Bellard Fabrice, 2018, BPG image format
[8]  
Bovik A, 2005, HANDBOOK OF IMAGE AND VIDEO PROCESSING, 2ND EDITION, pV, DOI 10.1016/B978-012119792-6/50062-0
[9]   Overview of the Versatile Video Coding (VVC) Standard and its Applications [J].
Bross, Benjamin ;
Wang, Ye-Kui ;
Ye, Yan ;
Liu, Shan ;
Chen, Jianle ;
Sullivan, Gary J. ;
Ohm, Jens-Rainer .
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2021, 31 (10) :3736-3764
[10]   THE LAPLACIAN PYRAMID AS A COMPACT IMAGE CODE [J].
BURT, PJ ;
ADELSON, EH .
IEEE TRANSACTIONS ON COMMUNICATIONS, 1983, 31 (04) :532-540