Insights From Generative Modeling for Neural Video Compression

被引:0
|
作者
Yang, Ruihan [1 ]
Yang, Yibo [1 ]
Marino, Joseph [2 ]
Mandt, Stephan [1 ]
机构
[1] Univ Calif Irvine, Dept Comp Sci, Irvine, CA 92697 USA
[2] CALTECH, DeepMind, Pasadena, CA 91125 USA
基金
美国国家科学基金会;
关键词
Transforms; Video compression; Data models; Image coding; Predictive coding; Streaming media; Rate-distortion; Autoregressive models; generative models; normalizing flow; variational inference; video compression;
D O I
10.1109/TPAMI.2023.3260684
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
While recent machine learning research has revealed connections between deep generative models such as VAEs and rate-distortion losses used in learned compression, most of this work has focused on images. In a similar spirit, we view recently proposed neural video coding algorithms through the lens of deep autoregressive and latent variable modeling. We present these codecs as instances of a generalized stochastic temporal autoregressive transform, and propose new avenues for further improvements inspired by normalizing flows and structured priors. We propose several architectures that yield state-of-the-art video compression performance on high-resolution video and discuss their tradeoffs and ablations. In particular, we propose (i) improved temporal autoregressive transforms, (ii) improved entropy models with structured and temporal dependencies, and (iii) variable bitrate versions of our algorithms. Since our improvements are compatible with a large class of existing models, we provide further evidence that the generative modeling viewpoint can advance the neural video coding field.
引用
收藏
页码:9908 / 9921
页数:14
相关论文
共 50 条
  • [41] An Optimized Framework of Video Compression Using Deep Convolutional Neural Networks (DCNN)
    Sreelatha, M.
    Tulasi, R. Lakshmi
    Kumar, K. Siva
    INTERNATIONAL JOURNAL OF COMPUTER SCIENCE AND NETWORK SECURITY, 2022, 22 (05): : 515 - 522
  • [42] Neural Video Compression with Spatio-Temporal Cross-Covariance Transformers
    Chen, Zhenghao
    Relic, Lucas
    Azevedo, Roberto
    Zhang, Yang
    Gross, Markus
    Xu, Dong
    Zhou, Luping
    Schroers, Christopher
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 8543 - 8551
  • [43] Learning Cross-Scale Weighted Prediction for Efficient Neural Video Compression
    Guo, Zongyu
    Feng, Runsen
    Zhang, Zhizheng
    Jin, Xin
    Chen, Zhibo
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2023, 32 : 3567 - 3579
  • [44] Semantics-Guided and Saliency-Focused Learning of Perceptual Video Compression
    Li, Bingyao
    IEEE ACCESS, 2024, 12 : 68611 - 68623
  • [45] RECENT DEVELOPMENTS FROM MPEG IN HDR VIDEO COMPRESSION
    Kerofsky, Louis
    Ye, Yan
    He, Yuwen
    2016 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2016, : 879 - 883
  • [46] Adaptive Surveillance Video Compression With Background Hyperprior
    Zhao, Yu
    Tang, Song
    Ye, Mao
    IEEE SIGNAL PROCESSING LETTERS, 2025, 32 : 456 - 460
  • [47] Temporal Context Mining for Learned Video Compression
    Sheng, Xihua
    Li, Jiahao
    Li, Bin
    Li, Li
    Liu, Dong
    Lu, Yan
    IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 7311 - 7322
  • [48] Review of the current and future technologies for video compression
    Lu Yu
    Jian-peng Wang
    Journal of Zhejiang University SCIENCE C, 2010, 11 : 1 - 13
  • [49] Video Compression With CNN-Based Postprocessing
    Zhang, Fan
    Ma, Di
    Feng, Chen
    Bull, David R.
    IEEE MULTIMEDIA, 2021, 28 (04) : 74 - 83
  • [50] Temporal Adaptive Learned Surveillance Video Compression
    Zhao, Yu
    Ye, Mao
    Ji, Luping
    Guo, Hongwei
    Zhu, Ce
    IEEE TRANSACTIONS ON BROADCASTING, 2025, 71 (01) : 142 - 153