Neural Video Compression with Spatio-Temporal Cross-Covariance Transformers

被引:3
|
作者
Chen, Zhenghao [1 ,3 ]
Relic, Lucas [2 ]
Azevedo, Roberto [3 ]
Zhang, Yang [3 ]
Gross, Markus [2 ]
Xu, Dong [4 ]
Zhou, Luping [1 ]
Schroers, Christopher [3 ]
机构
[1] Univ Sydney, Sydney, NSW, Australia
[2] Swiss Fed Inst Technol, Zurich, Switzerland
[3] DisneyRes Studios, Zurich, Switzerland
[4] Univ Hong Kong, Hong Kong, Peoples R China
来源
PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023 | 2023年
关键词
Video compression; neural network; transformer;
D O I
10.1145/3581783.3611960
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Although existing neural video compression (NVC) methods have achieved significant success, most of them focus on improving either temporal or spatial information separately. They generally use simple operations such as concatenation or subtraction to utilize this information, while such operations only partially exploit spatio-temporal redundancies. This work aims to effectively and jointly leverage robust temporal and spatial information by proposing a new 3D-based transformer module: Spatio-Temporal Cross-Covariance Transformer (ST-XCT). The ST-XCT module combines two individual extracted features into a joint spatio-temporal feature, followed by 3D convolutional operations and a novel spatio-temporal-aware cross-covariance attention mechanism. Unlike conventional transformers, the cross-covariance attention mechanism is applied across the feature channels without breaking down the spatio-temporal features into local tokens. Such design allows for modeling global cross-channel correlations of the spatio-temporal context while lowering the computational requirement. Based on ST-XCT, we introduce a novel transformer-based end-to-end optimized NVC framework. ST-XCT-based modules are integrated into various key coding components of NVC, such as feature extraction, frame reconstruction, and entropy modeling, demonstrating its generalizability. Extensive experiments show that our ST-XCT-based NVC proposal achieves state-of-the-art compression performances on various standard video benchmark datasets.
引用
收藏
页码:8543 / 8551
页数:9
相关论文
共 50 条
  • [41] SPATIO-TEMPORAL COHERENCE RESONANCE AND FIRING SYNCHRONIZATION IN A NEURAL NETWORK: NOISE AND COUPLING EFFECTS
    Zheng, Yanhong
    Lu, Qishao
    Wang, Qingyun
    INTERNATIONAL JOURNAL OF MODERN PHYSICS C, 2009, 20 (03): : 469 - 478
  • [42] Spatio-Temporal Dynamic Interlaced Network for 3D human pose estimation in video
    Xu, Feiyi
    Wang, Jifan
    Sun, Ying
    Qi, Jin
    Dong, Zhenjiang
    Sun, Yanfei
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2025, 251
  • [43] SPEECH RECOGNITION MODEL WITH SPATIO-TEMPORAL ENCODING AND BRAIN-LIKE NEURAL NETWORKS
    Yang Lewei
    Wang Haiqing
    Sun Fengwei
    2022 19TH INTERNATIONAL COMPUTER CONFERENCE ON WAVELET ACTIVE MEDIA TECHNOLOGY AND INFORMATION PROCESSING (ICCWAMTIP), 2022,
  • [44] Bayesian Physics Informed Neural Networks for data assimilation and spatio-temporal modelling of wildfires
    Dabrowski, Joel Janek
    Pagendam, Daniel Edward
    Hilton, James
    Sanderson, Conrad
    MacKinlay, Daniel
    Huston, Carolyn
    Bolt, Andrew
    Kuhnert, Petra
    SPATIAL STATISTICS, 2023, 55
  • [45] Beyond Short-Term Snippet: Video Relation Detection with Spatio-Temporal Global Context
    Liu, Chenchen
    Jin, Yang
    Xu, Kehan
    Gong, Guoqiang
    Mu, Yadong
    2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, : 10837 - 10846
  • [46] Cross-attention Spatio-temporal Context Transformer for Semantic Segmentation of Historical Maps
    Wu, Sidi
    Chen, Yizi
    Schindler, Konrad
    Hurni, Lorenz
    31ST ACM SIGSPATIAL INTERNATIONAL CONFERENCE ON ADVANCES IN GEOGRAPHIC INFORMATION SYSTEMS, ACM SIGSPATIAL GIS 2023, 2023, : 106 - 114
  • [47] Multi-Stage Spatio-Temporal Aggregation Transformer for Video Person Re-Identification
    Tang, Ziyi
    Zhang, Ruimao
    Peng, Zhanglin
    Chen, Jinrui
    Lin, Liang
    IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 7917 - 7929
  • [48] Multi-associative neural networks and their applications to learning and retrieving complex spatio-temporal sequences
    Wang, L
    IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS, 1999, 29 (01): : 73 - 82
  • [49] Stability Switches, Hopf Bifurcations, and Spatio-temporal Patterns in a Delayed Neural Model with Bidirectional Coupling
    Song, Yongli
    Zhang, Tonghua
    Tade, Moses O.
    JOURNAL OF NONLINEAR SCIENCE, 2009, 19 (06) : 597 - 632
  • [50] A Spatio-Temporal Schedule-Based Neural Network for Urban Taxi Waiting Time Prediction
    You, Lan
    Guan, Zhengyi
    Li, Na
    Zhang, Jiahe
    Cui, Haibo
    Claramunt, Christophe
    Cao, Rui
    ISPRS INTERNATIONAL JOURNAL OF GEO-INFORMATION, 2021, 10 (10)