Insights From Generative Modeling for Neural Video Compression

被引：0

作者：

Yang, Ruihan ^{[1
]}

Yang, Yibo ^{[1
]}

Marino, Joseph ^{[2
]}

Mandt, Stephan ^{[1
]}

机构：

[1] Univ Calif Irvine, Dept Comp Sci, Irvine, CA 92697 USA

[2] CALTECH, DeepMind, Pasadena, CA 91125 USA

来源：

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE | 2023年 / 45卷 / 08期

基金：

美国国家科学基金会;

关键词：

Transforms; Video compression; Data models; Image coding; Predictive coding; Streaming media; Rate-distortion; Autoregressive models; generative models; normalizing flow; variational inference; video compression;

D O I：

10.1109/TPAMI.2023.3260684

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

While recent machine learning research has revealed connections between deep generative models such as VAEs and rate-distortion losses used in learned compression, most of this work has focused on images. In a similar spirit, we view recently proposed neural video coding algorithms through the lens of deep autoregressive and latent variable modeling. We present these codecs as instances of a generalized stochastic temporal autoregressive transform, and propose new avenues for further improvements inspired by normalizing flows and structured priors. We propose several architectures that yield state-of-the-art video compression performance on high-resolution video and discuss their tradeoffs and ablations. In particular, we propose (i) improved temporal autoregressive transforms, (ii) improved entropy models with structured and temporal dependencies, and (iii) variable bitrate versions of our algorithms. Since our improvements are compatible with a large class of existing models, we provide further evidence that the generative modeling viewpoint can advance the neural video coding field.

引用

页码：9908 / 9921

页数：14

共 50 条

[41] An Optimized Framework of Video Compression Using Deep Convolutional Neural Networks (DCNN)
Sreelatha, M.
Tulasi, R. Lakshmi
Kumar, K. Siva
INTERNATIONAL JOURNAL OF COMPUTER SCIENCE AND NETWORK SECURITY, 2022, 22 (05): : 515 - 522
[42] Neural Video Compression with Spatio-Temporal Cross-Covariance Transformers
Chen, Zhenghao
Relic, Lucas
Azevedo, Roberto
Zhang, Yang
Gross, Markus
Xu, Dong
Zhou, Luping
Schroers, Christopher
PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 8543 - 8551
[43] Learning Cross-Scale Weighted Prediction for Efficient Neural Video Compression
Guo, Zongyu
Feng, Runsen
Zhang, Zhizheng
Jin, Xin
Chen, Zhibo
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2023, 32 : 3567 - 3579
[44] Semantics-Guided and Saliency-Focused Learning of Perceptual Video Compression
Li, Bingyao
IEEE ACCESS, 2024, 12 : 68611 - 68623
[45] RECENT DEVELOPMENTS FROM MPEG IN HDR VIDEO COMPRESSION
Kerofsky, Louis
Ye, Yan
He, Yuwen
2016 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2016, : 879 - 883
[46] Adaptive Surveillance Video Compression With Background Hyperprior
Zhao, Yu
Tang, Song
Ye, Mao
IEEE SIGNAL PROCESSING LETTERS, 2025, 32 : 456 - 460
[47] Temporal Context Mining for Learned Video Compression
Sheng, Xihua
Li, Jiahao
Li, Bin
Li, Li
Liu, Dong
Lu, Yan
IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 7311 - 7322
[48] Review of the current and future technologies for video compression
Lu Yu
Jian-peng Wang
Journal of Zhejiang University SCIENCE C, 2010, 11 : 1 - 13
[49] Video Compression With CNN-Based Postprocessing
Zhang, Fan
Ma, Di
Feng, Chen
Bull, David R.
IEEE MULTIMEDIA, 2021, 28 (04) : 74 - 83
[50] Temporal Adaptive Learned Surveillance Video Compression
Zhao, Yu
Ye, Mao
Ji, Luping
Guo, Hongwei
Zhu, Ce
IEEE TRANSACTIONS ON BROADCASTING, 2025, 71 (01) : 142 - 153

← 1 2 3 4 5 →