Cross-scale hierarchical spatio-temporal transformer for video enhancement

被引:0
|
作者
Jiang, Qin [1 ,2 ,3 ]
Wang, Qinglin [1 ,2 ,3 ]
Chi, Lihua [4 ]
Liu, Jie [1 ,2 ,3 ]
机构
[1] Natl Univ Def Technol, Changsha, Peoples R China
[2] Lab Digitizing Software Frontier Equipment, Changsha, Peoples R China
[3] Sci & Technol Parallel & Distributed Proc Lab, Changsha, Peoples R China
[4] Hunan GuoKe Computil Technol Co Ltd, Changsha, Peoples R China
关键词
Video super-resolution; Denoising; Deblurring; Transformer; Temporal;
D O I
10.1016/j.knosys.2024.112773
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The diversity and complexity of degradations in low-quality videos pose non-trivial challenges on video enhancement to reconstruct the high-quality counterparts. Prevailing sliding window based methods represent poor performance due to the limitation of window size. Recurrent networks take advantage of long-term modeling to aggregate more information, resulting insignificant performance improvements. However, most of them are trained on simple degraded data and can only tackle specific degradation. To break through the limitation, we propose a progressive alignment network, namely Cross-scale Hierarchical Spatio-Temporal Transformer (CHSTT), which leverages cross-scale tokenization to generate multi-scale visual tokens in the entire sequence to capture extensive long-range temporal dependencies. To enhance the spatial and temporal interactions, we introduce an innovative hierarchical Transformer, facilitating the computation of mutual multi-head attention across both spatial and temporal dimensions. Quantitative and qualitative assessments substantiate the superior performance of CHSTT compared to several state-of-the-art benchmarks across three distinct video enhancement tasks, including video super-resolution, video denoising, and video deblurring.
引用
收藏
页数:13
相关论文
共 50 条
  • [41] Human-Centric Spatio-Temporal Video Grounding With Visual Transformers
    Tang, Zongheng
    Liao, Yue
    Liu, Si
    Li, Guanbin
    Jin, Xiaojie
    Jiang, Hongxu
    Yu, Qian
    Xu, Dong
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (12) : 8238 - 8249
  • [42] Spatio-Temporal Outdoor Lighting Aggregation on Image Sequences Using Transformer Networks
    Lee, Haebom
    Homeyer, Christian
    Herzog, Robert
    Rexilius, Jan
    Rother, Carsten
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2023, 131 (04) : 1060 - 1072
  • [43] Video Super-Resolution via a Spatio-Temporal Alignment Network
    Wen, Weilei
    Ren, Wenqi
    Shi, Yinghuan
    Nie, Yunfeng
    Zhang, Jingang
    Cao, Xiaochun
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 1761 - 1773
  • [44] Spatio-temporal interpretable neural network for solar irradiation prediction using transformer
    Gao, Yuan
    Miyata, Shohei
    Matsunami, Yuki
    Akashi, Yasunori
    ENERGY AND BUILDINGS, 2023, 297
  • [45] Decoupled spatio-temporal grouping transformer for skeleton-based action recognition
    Sun, Shengkun
    Jia, Zihao
    Zhu, Yisheng
    Liu, Guangcan
    Yu, Zhengtao
    VISUAL COMPUTER, 2024, 40 (08): : 5733 - 5745
  • [46] Spatio-Temporal Outdoor Lighting Aggregation on Image Sequences Using Transformer Networks
    Haebom Lee
    Christian Homeyer
    Robert Herzog
    Jan Rexilius
    Carsten Rother
    International Journal of Computer Vision, 2023, 131 : 1060 - 1072
  • [47] SFormer: An end-to-end spatio-temporal transformer architecture for deepfake detection
    Kingra, Staffy
    Aggarwal, Naveen
    Kaur, Nirmal
    FORENSIC SCIENCE INTERNATIONAL-DIGITAL INVESTIGATION, 2024, 51
  • [48] Learning Action-guided Spatio-temporal Transformer for Group Activity Recognition
    Li, Wei
    Yang, Tianzhao
    Wu, Xiao
    Du, Xian-Jun
    Qiao, Jian-Jun
    PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 2051 - 2060
  • [49] A Spatio-Temporal Graph Transformer Network for Multi-Pedestrain Trajectory Prediction
    Zhu, Jingfei
    Lian, Zhichao
    Jiang, Zhukai
    2022 IEEE INTL CONF ON DEPENDABLE, AUTONOMIC AND SECURE COMPUTING, INTL CONF ON PERVASIVE INTELLIGENCE AND COMPUTING, INTL CONF ON CLOUD AND BIG DATA COMPUTING, INTL CONF ON CYBER SCIENCE AND TECHNOLOGY CONGRESS (DASC/PICOM/CBDCOM/CYBERSCITECH), 2022, : 909 - 913
  • [50] Spatio-Temporal Adaptive Embedding Makes Vanilla Transformer SOTA for Traffic Forecasting
    Liu, Hangchen
    Dong, Zheng
    Jiang, Renhe
    Deng, Jiewen
    Deng, Jinliang
    Chen, Quanjun
    Song, Xuan
    PROCEEDINGS OF THE 32ND ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2023, 2023, : 4125 - 4129