Neural Video Compression with Spatio-Temporal Cross-Covariance Transformers

被引:3
|
作者
Chen, Zhenghao [1 ,3 ]
Relic, Lucas [2 ]
Azevedo, Roberto [3 ]
Zhang, Yang [3 ]
Gross, Markus [2 ]
Xu, Dong [4 ]
Zhou, Luping [1 ]
Schroers, Christopher [3 ]
机构
[1] Univ Sydney, Sydney, NSW, Australia
[2] Swiss Fed Inst Technol, Zurich, Switzerland
[3] DisneyRes Studios, Zurich, Switzerland
[4] Univ Hong Kong, Hong Kong, Peoples R China
来源
PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023 | 2023年
关键词
Video compression; neural network; transformer;
D O I
10.1145/3581783.3611960
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Although existing neural video compression (NVC) methods have achieved significant success, most of them focus on improving either temporal or spatial information separately. They generally use simple operations such as concatenation or subtraction to utilize this information, while such operations only partially exploit spatio-temporal redundancies. This work aims to effectively and jointly leverage robust temporal and spatial information by proposing a new 3D-based transformer module: Spatio-Temporal Cross-Covariance Transformer (ST-XCT). The ST-XCT module combines two individual extracted features into a joint spatio-temporal feature, followed by 3D convolutional operations and a novel spatio-temporal-aware cross-covariance attention mechanism. Unlike conventional transformers, the cross-covariance attention mechanism is applied across the feature channels without breaking down the spatio-temporal features into local tokens. Such design allows for modeling global cross-channel correlations of the spatio-temporal context while lowering the computational requirement. Based on ST-XCT, we introduce a novel transformer-based end-to-end optimized NVC framework. ST-XCT-based modules are integrated into various key coding components of NVC, such as feature extraction, frame reconstruction, and entropy modeling, demonstrating its generalizability. Extensive experiments show that our ST-XCT-based NVC proposal achieves state-of-the-art compression performances on various standard video benchmark datasets.
引用
收藏
页码:8543 / 8551
页数:9
相关论文
共 50 条
  • [31] Causal Probabilistic Spatio-Temporal Fusion Transformers in Two-Sided Ride-Hailing Markets
    Wan, Shixiang
    Luo, Shikai
    Zhu, Hongtu
    ACM TRANSACTIONS ON SPATIAL ALGORITHMS AND SYSTEMS, 2024, 10 (03)
  • [32] Synchronization analysis for delayed spatio-temporal neural networks with fractional-order
    Zheng, Bibo
    Hu, Cheng
    Yu, Juan
    Jiang, Haijun
    NEUROCOMPUTING, 2021, 441 : 226 - 236
  • [33] Spatio-temporal Soil Moisture Estimation Using Neural Network with Wavelet Preprocessing
    Kulaglic, Ajla
    Ustundag, B. Berk
    2017 6TH INTERNATIONAL CONFERENCE ON AGRO-GEOINFORMATICS, 2017, : 130 - 135
  • [34] Undetectable video steganography by considering spatio-temporal steganalytic features in the embedding cost function
    Negin Ghamsarian
    Morteza Khademi
    Multimedia Tools and Applications, 2020, 79 : 18909 - 18939
  • [35] Leveraging Transfer Learning for Spatio-Temporal Human Activity Recognition from Video Sequences
    Butt, Umair Muneer
    Ullah, Hadiqa Aman
    Letchmunan, Sukumar
    Tariq, Iqra
    Hassan, Fadratul Hafinaz
    Koh, Tieng Wei
    CMC-COMPUTERS MATERIALS & CONTINUA, 2023, 74 (03): : 5017 - 5033
  • [36] A log-additive neural model for spatio-temporal prediction of groundwater levels
    Pagendam, Dan
    Janardhanan, Sreekanth
    Dabrowski, Joel
    MacKinlay, Dan
    SPATIAL STATISTICS, 2023, 55
  • [37] Undetectable video steganography by considering spatio-temporal steganalytic features in the embedding cost function
    Ghamsarian, Negin
    Khademi, Morteza
    MULTIMEDIA TOOLS AND APPLICATIONS, 2020, 79 (27-28) : 18909 - 18939
  • [38] Long-Term Temporal Context Gathering for Neural Video Compression
    Qi, Linfeng
    Jia, Zhaoyang
    Li, Jiahao
    Li, Bin
    Li, Houqiang
    Lu, Yan
    COMPUTER VISION - ECCV 2024, PT LXVI, 2025, 15124 : 305 - 322
  • [39] Hybrid Spatial-Temporal Entropy Modelling for Neural Video Compression
    Li, Jiahao
    Li, Bin
    Lu, Yan
    PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 1503 - 1511
  • [40] Integrating spatio-temporal density-based clustering and neural networks for earthquake classification
    Delgado, Luis
    Peralta, Billy
    Nicolis, Orietta
    Diaz, Mailiu
    EXPERT SYSTEMS WITH APPLICATIONS, 2025, 277