Cross-scale hierarchical spatio-temporal transformer for video enhancement

被引：0

作者：

Jiang, Qin ^{[1
,2
,3
]}

Wang, Qinglin ^{[1
,2
,3
]}

Chi, Lihua ^{[4
]}

Liu, Jie ^{[1
,2
,3
]}

机构：

[1] Natl Univ Def Technol, Changsha, Peoples R China

[2] Lab Digitizing Software Frontier Equipment, Changsha, Peoples R China

[3] Sci & Technol Parallel & Distributed Proc Lab, Changsha, Peoples R China

[4] Hunan GuoKe Computil Technol Co Ltd, Changsha, Peoples R China

来源：

KNOWLEDGE-BASED SYSTEMS | 2025年 / 309卷

关键词：

Video super-resolution; Denoising; Deblurring; Transformer; Temporal;

D O I：

10.1016/j.knosys.2024.112773

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The diversity and complexity of degradations in low-quality videos pose non-trivial challenges on video enhancement to reconstruct the high-quality counterparts. Prevailing sliding window based methods represent poor performance due to the limitation of window size. Recurrent networks take advantage of long-term modeling to aggregate more information, resulting insignificant performance improvements. However, most of them are trained on simple degraded data and can only tackle specific degradation. To break through the limitation, we propose a progressive alignment network, namely Cross-scale Hierarchical Spatio-Temporal Transformer (CHSTT), which leverages cross-scale tokenization to generate multi-scale visual tokens in the entire sequence to capture extensive long-range temporal dependencies. To enhance the spatial and temporal interactions, we introduce an innovative hierarchical Transformer, facilitating the computation of mutual multi-head attention across both spatial and temporal dimensions. Quantitative and qualitative assessments substantiate the superior performance of CHSTT compared to several state-of-the-art benchmarks across three distinct video enhancement tasks, including video super-resolution, video denoising, and video deblurring.

引用

页数：13

共 50 条

[41] Human-Centric Spatio-Temporal Video Grounding With Visual Transformers
Tang, Zongheng
Liao, Yue
Liu, Si
Li, Guanbin
Jin, Xiaojie
Jiang, Hongxu
Yu, Qian
Xu, Dong
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (12) : 8238 - 8249
[42] Spatio-Temporal Outdoor Lighting Aggregation on Image Sequences Using Transformer Networks
Lee, Haebom
Homeyer, Christian
Herzog, Robert
Rexilius, Jan
Rother, Carsten
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2023, 131 (04) : 1060 - 1072
[43] Video Super-Resolution via a Spatio-Temporal Alignment Network
Wen, Weilei
Ren, Wenqi
Shi, Yinghuan
Nie, Yunfeng
Zhang, Jingang
Cao, Xiaochun
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 1761 - 1773
[44] Spatio-temporal interpretable neural network for solar irradiation prediction using transformer
Gao, Yuan
Miyata, Shohei
Matsunami, Yuki
Akashi, Yasunori
ENERGY AND BUILDINGS, 2023, 297
[45] Decoupled spatio-temporal grouping transformer for skeleton-based action recognition
Sun, Shengkun
Jia, Zihao
Zhu, Yisheng
Liu, Guangcan
Yu, Zhengtao
VISUAL COMPUTER, 2024, 40 (08): : 5733 - 5745
[46] Spatio-Temporal Outdoor Lighting Aggregation on Image Sequences Using Transformer Networks
Haebom Lee
Christian Homeyer
Robert Herzog
Jan Rexilius
Carsten Rother
International Journal of Computer Vision, 2023, 131 : 1060 - 1072
[47] SFormer: An end-to-end spatio-temporal transformer architecture for deepfake detection
Kingra, Staffy
Aggarwal, Naveen
Kaur, Nirmal
FORENSIC SCIENCE INTERNATIONAL-DIGITAL INVESTIGATION, 2024, 51
[48] Learning Action-guided Spatio-temporal Transformer for Group Activity Recognition
Li, Wei
Yang, Tianzhao
Wu, Xiao
Du, Xian-Jun
Qiao, Jian-Jun
PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 2051 - 2060
[49] A Spatio-Temporal Graph Transformer Network for Multi-Pedestrain Trajectory Prediction
Zhu, Jingfei
Lian, Zhichao
Jiang, Zhukai
2022 IEEE INTL CONF ON DEPENDABLE, AUTONOMIC AND SECURE COMPUTING, INTL CONF ON PERVASIVE INTELLIGENCE AND COMPUTING, INTL CONF ON CLOUD AND BIG DATA COMPUTING, INTL CONF ON CYBER SCIENCE AND TECHNOLOGY CONGRESS (DASC/PICOM/CBDCOM/CYBERSCITECH), 2022, : 909 - 913
[50] Spatio-Temporal Adaptive Embedding Makes Vanilla Transformer SOTA for Traffic Forecasting
Liu, Hangchen
Dong, Zheng
Jiang, Renhe
Deng, Jiewen
Deng, Jinliang
Chen, Quanjun
Song, Xuan
PROCEEDINGS OF THE 32ND ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2023, 2023, : 4125 - 4129

← 1 2 3 4 5 →