Cross-scale hierarchical spatio-temporal transformer for video enhancement

被引：0

作者：

Jiang, Qin ^{[1
,2
,3
]}

Wang, Qinglin ^{[1
,2
,3
]}

Chi, Lihua ^{[4
]}

Liu, Jie ^{[1
,2
,3
]}

机构：

[1] Natl Univ Def Technol, Changsha, Peoples R China

[2] Lab Digitizing Software Frontier Equipment, Changsha, Peoples R China

[3] Sci & Technol Parallel & Distributed Proc Lab, Changsha, Peoples R China

[4] Hunan GuoKe Computil Technol Co Ltd, Changsha, Peoples R China

来源：

KNOWLEDGE-BASED SYSTEMS | 2025年 / 309卷

关键词：

Video super-resolution; Denoising; Deblurring; Transformer; Temporal;

D O I：

10.1016/j.knosys.2024.112773

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The diversity and complexity of degradations in low-quality videos pose non-trivial challenges on video enhancement to reconstruct the high-quality counterparts. Prevailing sliding window based methods represent poor performance due to the limitation of window size. Recurrent networks take advantage of long-term modeling to aggregate more information, resulting insignificant performance improvements. However, most of them are trained on simple degraded data and can only tackle specific degradation. To break through the limitation, we propose a progressive alignment network, namely Cross-scale Hierarchical Spatio-Temporal Transformer (CHSTT), which leverages cross-scale tokenization to generate multi-scale visual tokens in the entire sequence to capture extensive long-range temporal dependencies. To enhance the spatial and temporal interactions, we introduce an innovative hierarchical Transformer, facilitating the computation of mutual multi-head attention across both spatial and temporal dimensions. Quantitative and qualitative assessments substantiate the superior performance of CHSTT compared to several state-of-the-art benchmarks across three distinct video enhancement tasks, including video super-resolution, video denoising, and video deblurring.

引用

页数：13

共 50 条

[1] MSTG: Multi-Scale Transformer with Gradient for joint spatio-temporal enhancement
Lin, Xin
Chen, Junli
Ai, Shaojie
Liu, Jing
Li, Bochao
Li, Qingying
Ma, Rui
JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2024, 102
[2] Spatio-Temporal Transformer Network for Video Restoration
Kim, Tae Hyun
Sajjadi, Mehdi S. M.
Hirsch, Michael
Schoelkopf, Bernhard
COMPUTER VISION - ECCV 2018, PT III, 2018, 11207 : 111 - 127
[3] Transformer with Spatio-Temporal Representation for Video Anomaly Detection
Sun, Xiaohu
Chen, Jinyi
Shen, Xulin
Li, Hongjun
STRUCTURAL, SYNTACTIC, AND STATISTICAL PATTERN RECOGNITION, S+SSPR 2022, 2022, 13813 : 213 - 222
[4] Spatio-Temporal Scale Selection in Video Data
Tony Lindeberg
Journal of Mathematical Imaging and Vision, 2018, 60 : 525 - 562
[5] Spatio-Temporal Scale Selection in Video Data
Lindeberg, Tony
JOURNAL OF MATHEMATICAL IMAGING AND VISION, 2018, 60 (04) : 525 - 562
[6] Parallel Spatio-Temporal Attention Transformer for Video Frame Interpolation
Ning, Xin
Cai, Feifan
Li, Yuhang
Ding, Youdong
ELECTRONICS, 2024, 13 (10)
[7] Cross-Scale KNN Image Transformer for Image Restoration
Lee, Hunsang
Choi, Hyesong
Sohn, Kwanghoon
Min, Dongbo
IEEE ACCESS, 2023, 11 : 13013 - 13027
[8] Multiple Hierarchical Cross-Scale Transformer for Remote Sensing Scene Classification
Zhang, Dan
Ma, Wenping
Jiao, Licheng
Liu, Xu
Yang, Yuting
Liu, Fang
REMOTE SENSING, 2025, 17 (01)
[9] Neural Video Compression with Spatio-Temporal Cross-Covariance Transformers
Chen, Zhenghao
Relic, Lucas
Azevedo, Roberto
Zhang, Yang
Gross, Markus
Xu, Dong
Zhou, Luping
Schroers, Christopher
PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 8543 - 8551
[10] Hierarchical Spatio-Temporal Graph Convolutional Networks and Transformer Network for Traffic Flow Forecasting
Huo, Guangyu
Zhang, Yong
Wang, Boyue
Gao, Junbin
Hu, Yongli
Yin, Baocai
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2023, 24 (04) : 3855 - 3867

← 1 2 3 4 5 →