Neural Video Compression with Spatio-Temporal Cross-Covariance Transformers

被引:3
|
作者
Chen, Zhenghao [1 ,3 ]
Relic, Lucas [2 ]
Azevedo, Roberto [3 ]
Zhang, Yang [3 ]
Gross, Markus [2 ]
Xu, Dong [4 ]
Zhou, Luping [1 ]
Schroers, Christopher [3 ]
机构
[1] Univ Sydney, Sydney, NSW, Australia
[2] Swiss Fed Inst Technol, Zurich, Switzerland
[3] DisneyRes Studios, Zurich, Switzerland
[4] Univ Hong Kong, Hong Kong, Peoples R China
来源
PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023 | 2023年
关键词
Video compression; neural network; transformer;
D O I
10.1145/3581783.3611960
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Although existing neural video compression (NVC) methods have achieved significant success, most of them focus on improving either temporal or spatial information separately. They generally use simple operations such as concatenation or subtraction to utilize this information, while such operations only partially exploit spatio-temporal redundancies. This work aims to effectively and jointly leverage robust temporal and spatial information by proposing a new 3D-based transformer module: Spatio-Temporal Cross-Covariance Transformer (ST-XCT). The ST-XCT module combines two individual extracted features into a joint spatio-temporal feature, followed by 3D convolutional operations and a novel spatio-temporal-aware cross-covariance attention mechanism. Unlike conventional transformers, the cross-covariance attention mechanism is applied across the feature channels without breaking down the spatio-temporal features into local tokens. Such design allows for modeling global cross-channel correlations of the spatio-temporal context while lowering the computational requirement. Based on ST-XCT, we introduce a novel transformer-based end-to-end optimized NVC framework. ST-XCT-based modules are integrated into various key coding components of NVC, such as feature extraction, frame reconstruction, and entropy modeling, demonstrating its generalizability. Extensive experiments show that our ST-XCT-based NVC proposal achieves state-of-the-art compression performances on various standard video benchmark datasets.
引用
收藏
页码:8543 / 8551
页数:9
相关论文
共 50 条
  • [21] Blind separation of spatio-temporal Synfire sources and visualization of neural cliques
    Unger, Hilit
    Zeevi, Yehoshua Y.
    NEUROCOMPUTING, 2006, 69 (13-15) : 1475 - 1484
  • [22] Heterogeneous Video Transcoding to Lower Spatio-Temporal Resolutions and Different Encoding Formats
    Shanableh, Tamer
    Ghanbari, Mohammed
    IEEE TRANSACTIONS ON MULTIMEDIA, 2000, 2 (02) : 101 - 110
  • [23] Exploring time-series transformers for spatio-temporal prediction of microstructural evolution of polycrystalline grain
    Gao, Zihao
    Zhu, Changsheng
    Shu, Yafeng
    Wang, Canglong
    MATERIALS TODAY COMMUNICATIONS, 2024, 40
  • [24] Residual-based attention Physics-informed Neural Networks for spatio-temporal ageing assessment of transformers operated in renewable power plants
    Ramirez, Ibai
    Pino, Joel
    Sanz, Mikel
    Pardo, David
    del Rio, Luis
    Ortiz, Alvaro
    Morozovska, Kateryna
    Aizpurua, Jose I.
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2025, 139
  • [25] Deep Feature Compression Using Spatio-Temporal Arrangement Toward Collaborative Intelligent World
    Suzuki, Satoshi
    Takeda, Shoichiro
    Takagi, Motohiro
    Tanida, Ryuichi
    Kimata, Hideaki
    Shouno, Hayaru
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (06) : 3934 - 3946
  • [26] HIERARCHICAL SPATIO-TEMPORAL DYNAMICS OF A CHAOTIC NEURAL NETWORK FOR MULTISTABLE BINOCULAR RIVALRY
    Kakimoto, Yuta
    Aihara, Kazuyuki
    NEW MATHEMATICS AND NATURAL COMPUTATION, 2009, 5 (01) : 123 - 134
  • [27] Learning and retrieving spatio-temporal sequences with any static associative neural network
    Wang, LP
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II-ANALOG AND DIGITAL SIGNAL PROCESSING, 1998, 45 (06): : 729 - 738
  • [28] Spatio-Temporal Fish-Eye Image Processing Based on Neural Network
    Wu, Yanwen
    Zhang, Lei
    2020 5TH INTERNATIONAL CONFERENCE ON COMPUTER AND COMMUNICATION SYSTEMS (ICCCS 2020), 2020, : 356 - 362
  • [29] Neural congestion prediction system for trip modelling in heterogeneous spatio-temporal patterns
    Elleuch, Wiam
    Wali, Ali
    Alimi, Adel M.
    INTERNATIONAL JOURNAL OF SYSTEMS SCIENCE, 2020, 51 (08) : 1373 - 1391
  • [30] Spatio-temporal interpretable neural network for solar irradiation prediction using transformer
    Gao, Yuan
    Miyata, Shohei
    Matsunami, Yuki
    Akashi, Yasunori
    ENERGY AND BUILDINGS, 2023, 297