Cross-scale hierarchical spatio-temporal transformer for video enhancement

被引:0
|
作者
Jiang, Qin [1 ,2 ,3 ]
Wang, Qinglin [1 ,2 ,3 ]
Chi, Lihua [4 ]
Liu, Jie [1 ,2 ,3 ]
机构
[1] Natl Univ Def Technol, Changsha, Peoples R China
[2] Lab Digitizing Software Frontier Equipment, Changsha, Peoples R China
[3] Sci & Technol Parallel & Distributed Proc Lab, Changsha, Peoples R China
[4] Hunan GuoKe Computil Technol Co Ltd, Changsha, Peoples R China
关键词
Video super-resolution; Denoising; Deblurring; Transformer; Temporal;
D O I
10.1016/j.knosys.2024.112773
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The diversity and complexity of degradations in low-quality videos pose non-trivial challenges on video enhancement to reconstruct the high-quality counterparts. Prevailing sliding window based methods represent poor performance due to the limitation of window size. Recurrent networks take advantage of long-term modeling to aggregate more information, resulting insignificant performance improvements. However, most of them are trained on simple degraded data and can only tackle specific degradation. To break through the limitation, we propose a progressive alignment network, namely Cross-scale Hierarchical Spatio-Temporal Transformer (CHSTT), which leverages cross-scale tokenization to generate multi-scale visual tokens in the entire sequence to capture extensive long-range temporal dependencies. To enhance the spatial and temporal interactions, we introduce an innovative hierarchical Transformer, facilitating the computation of mutual multi-head attention across both spatial and temporal dimensions. Quantitative and qualitative assessments substantiate the superior performance of CHSTT compared to several state-of-the-art benchmarks across three distinct video enhancement tasks, including video super-resolution, video denoising, and video deblurring.
引用
收藏
页数:13
相关论文
共 50 条
  • [31] STformer: Spatio-Temporal Transformer for Multivariate Time Series Anomaly Detection
    Li, Zhengyu
    Zhang, Hongjie
    Zheng, Wei
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING-ICANN 2024, PT VI, 2024, 15021 : 297 - 311
  • [32] Remaining Useful Life Prediction via Spatio-Temporal Channels and Transformer
    Zeng, Ming
    Wu, Feng
    Cheng, Yiwei
    IEEE SENSORS JOURNAL, 2023, 23 (23) : 29176 - 29185
  • [33] MSSTNET: A MULTI-SCALE SPATIO-TEMPORAL CNN-TRANSFORMER NETWORK FOR DYNAMIC FACIAL EXPRESSION RECOGNITION
    Wang, Linhuang
    Kang, Xin
    Ding, Fei
    Nakagawa, Satoshi
    Ren, Fuji
    2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, ICASSP 2024, 2024, : 3015 - 3019
  • [34] CPLFormer: Cross-scale Prototype Learning Transformer for Image Snow Removal
    Chen, Sixiang
    Ye, Tian
    Liu, Yun
    Bai, Jinbin
    Chen, Haoyu
    Lin, Yunlong
    Shi, Jun
    Chen, Erkang
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 4228 - 4239
  • [35] CSformer: Cross-Scale Features Fusion Based Transformer for Image Denoising
    Yin, Haitao
    Ma, Siyuan
    IEEE SIGNAL PROCESSING LETTERS, 2022, 29 : 1809 - 1813
  • [36] Cross-scale and cross-dimensional adaptive transformer network for colorectal polyp segmentation
    Liang L.
    He A.
    Li R.
    Wu J.
    Guangxue Jingmi Gongcheng/Optics and Precision Engineering, 2023, 31 (18): : 2700 - 2712
  • [37] HASI: Hierarchical Attention-Aware Spatio-Temporal Interaction for Video-Based Person Re-Identification
    Chen, Si
    Da, Hui
    Wang, Da-Han
    Zhang, Xu-Yao
    Yan, Yan
    Zhu, Shunzhi
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (06) : 4973 - 4988
  • [38] Grouped Spatio-Temporal Alignment Network for Video Super-Resolution
    Lu, Mingxuan
    Zhang, Peng
    IEEE SIGNAL PROCESSING LETTERS, 2022, 29 : 2193 - 2197
  • [39] Point Spatio-Temporal Pyramid Network for Point Cloud Video Understanding
    Shen, Zhiqiang
    Wang, Longguang
    Guo, Yulan
    Liu, Qiong
    Zhou, Xi
    IEEE SIGNAL PROCESSING LETTERS, 2024, 31 (1209-1213) : 1209 - 1213
  • [40] Video super-resolution based on a spatio-temporal matching network
    Zhu, Xiaobin
    Li, Zhuangzi
    Lou, Jungang
    Shen, Qing
    PATTERN RECOGNITION, 2021, 110