Cross-scale hierarchical spatio-temporal transformer for video enhancement

被引：0

作者：

Jiang, Qin ^{[1
,2
,3
]}

Wang, Qinglin ^{[1
,2
,3
]}

Chi, Lihua ^{[4
]}

Liu, Jie ^{[1
,2
,3
]}

机构：

[1] Natl Univ Def Technol, Changsha, Peoples R China

[2] Lab Digitizing Software Frontier Equipment, Changsha, Peoples R China

[3] Sci & Technol Parallel & Distributed Proc Lab, Changsha, Peoples R China

[4] Hunan GuoKe Computil Technol Co Ltd, Changsha, Peoples R China

来源：

KNOWLEDGE-BASED SYSTEMS | 2025年 / 309卷

关键词：

Video super-resolution; Denoising; Deblurring; Transformer; Temporal;

D O I：

10.1016/j.knosys.2024.112773

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The diversity and complexity of degradations in low-quality videos pose non-trivial challenges on video enhancement to reconstruct the high-quality counterparts. Prevailing sliding window based methods represent poor performance due to the limitation of window size. Recurrent networks take advantage of long-term modeling to aggregate more information, resulting insignificant performance improvements. However, most of them are trained on simple degraded data and can only tackle specific degradation. To break through the limitation, we propose a progressive alignment network, namely Cross-scale Hierarchical Spatio-Temporal Transformer (CHSTT), which leverages cross-scale tokenization to generate multi-scale visual tokens in the entire sequence to capture extensive long-range temporal dependencies. To enhance the spatial and temporal interactions, we introduce an innovative hierarchical Transformer, facilitating the computation of mutual multi-head attention across both spatial and temporal dimensions. Quantitative and qualitative assessments substantiate the superior performance of CHSTT compared to several state-of-the-art benchmarks across three distinct video enhancement tasks, including video super-resolution, video denoising, and video deblurring.

引用

页数：13

共 50 条

[31] STformer: Spatio-Temporal Transformer for Multivariate Time Series Anomaly Detection
Li, Zhengyu
Zhang, Hongjie
Zheng, Wei
ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING-ICANN 2024, PT VI, 2024, 15021 : 297 - 311
[32] Remaining Useful Life Prediction via Spatio-Temporal Channels and Transformer
Zeng, Ming
Wu, Feng
Cheng, Yiwei
IEEE SENSORS JOURNAL, 2023, 23 (23) : 29176 - 29185
[33] MSSTNET: A MULTI-SCALE SPATIO-TEMPORAL CNN-TRANSFORMER NETWORK FOR DYNAMIC FACIAL EXPRESSION RECOGNITION
Wang, Linhuang
Kang, Xin
Ding, Fei
Nakagawa, Satoshi
Ren, Fuji
2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, ICASSP 2024, 2024, : 3015 - 3019
[34] CPLFormer: Cross-scale Prototype Learning Transformer for Image Snow Removal
Chen, Sixiang
Ye, Tian
Liu, Yun
Bai, Jinbin
Chen, Haoyu
Lin, Yunlong
Shi, Jun
Chen, Erkang
PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 4228 - 4239
[35] CSformer: Cross-Scale Features Fusion Based Transformer for Image Denoising
Yin, Haitao
Ma, Siyuan
IEEE SIGNAL PROCESSING LETTERS, 2022, 29 : 1809 - 1813
[36] Cross-scale and cross-dimensional adaptive transformer network for colorectal polyp segmentation
Liang L.
He A.
Li R.
Wu J.
Guangxue Jingmi Gongcheng/Optics and Precision Engineering, 2023, 31 (18): : 2700 - 2712
[37] HASI: Hierarchical Attention-Aware Spatio-Temporal Interaction for Video-Based Person Re-Identification
Chen, Si
Da, Hui
Wang, Da-Han
Zhang, Xu-Yao
Yan, Yan
Zhu, Shunzhi
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (06) : 4973 - 4988
[38] Grouped Spatio-Temporal Alignment Network for Video Super-Resolution
Lu, Mingxuan
Zhang, Peng
IEEE SIGNAL PROCESSING LETTERS, 2022, 29 : 2193 - 2197
[39] Point Spatio-Temporal Pyramid Network for Point Cloud Video Understanding
Shen, Zhiqiang
Wang, Longguang
Guo, Yulan
Liu, Qiong
Zhou, Xi
IEEE SIGNAL PROCESSING LETTERS, 2024, 31 (1209-1213) : 1209 - 1213
[40] Video super-resolution based on a spatio-temporal matching network
Zhu, Xiaobin
Li, Zhuangzi
Lou, Jungang
Shen, Qing
PATTERN RECOGNITION, 2021, 110

← 1 2 3 4 5 →