Spatial-Temporal Sequence Attention Based Efficient Transformer for Video Snow Removal

被引:0
|
作者
Gao, Tao [1 ]
Zhang, Qianxi [2 ]
Chen, Ting [3 ]
Wen, Yuanbo [3 ]
机构
[1] Changan Univ, Sch Data Sci & Artificial Intelligence, Xian 710064, Peoples R China
[2] Changan Univ, Sch Informat Engn, Xian 710064, Peoples R China
[3] Changan Univ, Sch Informat Engn, Xian 710064, Peoples R China
来源
BIG DATA MINING AND ANALYTICS | 2025年 / 8卷 / 03期
基金
中国国家自然科学基金;
关键词
video restoration; vision Transformer; window attention; computer vision; neural representation;
D O I
10.26599/BDMA.2024.9020061
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Video snow removal has tremendous potential in enhancing video quality and boosting the performance of computer vision tasks. Recently, Transformers have gained attention for the self-attention mechanism. However, the memory consumption of self-attention is considerable, limiting its application in high-resolution video restoration. In this paper, we propose an efficient video desnowing spatio-temporal Transformer, which utilizes spatio-temporal sequence attention to parallelly capture intra-frame spatial information and inter-frame temporal information, with much lower memory consumption compared to standard self-attention. Additionally, we mitigate the impact of snowflake occlusion on video frame alignment by leveraging an atmospheric scattering model. Furthermore, we introduce the concept of Neural Representation for Videos (NeRV) and effectively reconstruct compressed videos after multi-resolution feature extraction using the recovery NeRV module, thereby further reducing computational consumption. Extensive experiments demonstrate that the model achieves superior performance in video snow removal while significantly reducing computational resources.
引用
收藏
页码:551 / 562
页数:12
相关论文
共 49 条
  • [1] Collaborative spatial-temporal video salient object detection with cross attention transformer
    Su, Yuting
    Wang, Weikang
    Liu, Jing
    Jing, Peiguang
    SIGNAL PROCESSING, 2024, 224
  • [2] Video Compression through Advanced Video Saliency Aware Spatial-Temporal Integration and Attention Mechanisms
    H. Ravishankar
    R. D. AnithaKumari
    D. R. Sarvamangala
    C. Rashmi
    K. R. Deepa
    SN Computer Science, 5 (7)
  • [3] Deeply Coupled Convolution-Transformer With Spatial-Temporal Complementary Learning for Video-Based Person Re-Identification
    Liu, Xuehu
    Yu, Chenyang
    Zhang, Pingping
    Lu, Huchuan
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (10) : 13753 - 13763
  • [4] Attention-based spatial-temporal hierarchical ConvLSTM network for action recognition in videos
    Xue, Fei
    Ji, Hongbing
    Zhang, Wenbo
    Cao, Yi
    IET COMPUTER VISION, 2019, 13 (08) : 708 - 718
  • [5] Masked Autoencoders for Spatial-Temporal Relationship in Video-Based Group Activity Recognition
    Yadav, Rajeshwar
    Halder, Raju
    Banda, Gourinath
    IEEE ACCESS, 2024, 12 : 132084 - 132095
  • [6] Spatial-Temporal Transformer Network for Continuous Action Recognition in Industrial Assembly
    Huang, Jianfeng
    Liu, Xiang
    Hu, Huan
    Tang, Shanghua
    Li, Chenyang
    Zhao, Shaoan
    Lin, Yimin
    Wang, Kai
    Liu, Zhaoxiang
    Lian, Shiguo
    ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT X, ICIC 2024, 2024, 14871 : 114 - 130
  • [7] Multi-Scale Spatial-Temporal Transformer for Meteorological Variable Forecasting
    Li, Tian-Bao
    Su, Yu-Ting
    Song, Dan
    Li, Wen-Hui
    Wei, Zhi-Qiang
    Liu, An-An
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2025, 35 (03) : 2474 - 2486
  • [8] Spatial-Temporal Based Multihead Self-Attention for Remote Sensing Image Change Detection
    Zhou, Yong
    Wang, Fengkai
    Zhao, Jiaqi
    Yao, Rui
    Chen, Silin
    Ma, Heping
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (10) : 6615 - 6626
  • [9] GSTGM: Graph, spatial-temporal attention and generative based model for pedestrian multi-path prediction
    Khel, Muhammad Haris Kaka
    Greaney, Paul
    McAfee, Marion
    Moffett, Sandra
    Meehan, Kevin
    IMAGE AND VISION COMPUTING, 2024, 151
  • [10] STFormer: Spatial-Temporal-Aware Transformer for Video Instance Segmentation
    Li, Hao
    Wang, Wei
    Wang, Mengzhu
    Tan, Huibin
    Lan, Long
    Luo, Zhigang
    Liu, Xinwang
    Li, Kenli
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024,