Multi-Swin Transformer Based Spatio-Temporal Information Exploration for Compressed Video Quality Enhancement

被引：0

作者：

Yu, Li ^{[1
,2
]}

Wu, Shiyu ^{[3
]}

Gabbouj, Moncef ^{[4
]}

机构：

[1] Nanjing Univ Informat Sci & Technol, Sch Comp Sci, Nanjing 211544, Peoples R China

[2] Nanjing Univ Informat Sci & Technol, Jiangsu Collaborat Innovat Ctr Atmospher Environm, Nanjing 211544, Peoples R China

[3] Nanjing Univ Informat Sci & Technol, Sch Software, Nanjing 211544, Peoples R China

[4] Tampere Univ, Dept Comp Sci, Tampere 33100, Finland

来源：

IEEE SIGNAL PROCESSING LETTERS | 2024年 / 31卷

基金：

中国国家自然科学基金;

关键词：

Transformers; Convolution; Video recording; Quality assessment; Motion compensation; Feature extraction; Correlation; Compressed video quality enhancement; spatio-temporal information; swin transformer;

D O I：

10.1109/LSP.2024.3429008

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Spatio-temporal information plays an important role in compressed video quality enhancement. Most advanced studies use deformable convolution or Swin transformer to explore spatio-temporal information. However, deformable convolution based methods may incur inaccurate motion compensation due to the compression artifacts and limited receptive fields. The Swin transformer based approaches are unable to fully explore the spatio-temporal information, limited by its rigid window-based mechanism. To solve the above problems, we propose a novel multi-Swin transformer-based network for compressed video quality enhancement to better explore spatio-temporal information. The whole workflow consists of the Local Alignment (LA) Module, the Global Refinement Fusion (GRF) Module, and the Quality Enhancement (QE) Module. The LA module roughly perceives the local motion through the deformable fusion. Subsequently, the GRF module employs the proposed multi-Swin transformer to enhance the spatio-temporal perception. Finally, the QE module effectively restores the texture details across various scales. Extensive experimental results prove the effectiveness of the proposed method.

引用

页码：1880 / 1884

页数：5

共 26 条

[1] Study of Temporal Effects on Subjective Video Quality of Experience
Bampis, Christos George
Li, Zhi
Moorthy, Anush Krishna
Katsavounidis, Ioannis
Aaron, Anne
Bovik, Alan Conrad
[J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2017, 26 (11) : 5217 - 5231
[2] Overview of the Versatile Video Coding (VVC) Standard and its Applications
Bross, Benjamin
Wang, Ye-Kui
Ye, Yan
Liu, Shan
Chen, Jianle
Sullivan, Gary J.
Ohm, Jens-Rainer
[J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2021, 31 (10) : 3736 - 3764
[3] CHARBONNIER P, 1994, IEEE IMAGE PROC, P168
[4] Deng JN, 2020, AAAI CONF ARTIF INTE, V34, P10696
[5] Dewang Hou, 2021, 2021 3rd International Conference on Advances in Computer Technology, Information Science and Communication (CTISC), P232, DOI 10.1109/CTISC52352.2021.00050
[6] MFQE 2.0: A New Approach for Multi-Frame Quality Enhancement on Compressed Video
Guan, Zhenyu
Xing, Qunliang
Xu, Mai
Yang, Ren
Liu, Tie
Wang, Zulin
[J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2021, 43 (03) : 949 - 963
[7] Kingma D.P., 2014, arXiv, DOI 10.48550/arXiv.1412.6980
[8] Swin Transformer: Hierarchical Vision Transformer using Shifted Windows
Liu, Ze
Lin, Yutong
Cao, Yue
Hu, Han
Wei, Yixuan
Zhang, Zheng
Lin, Stephen
Guo, Baining
[J]. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 9992 - 10002
[9] Deep Non-Local Kalman Network for Video Compression Artifact Reduction
Lu, Guo
Zhang, Xiaoyun
Ouyang, Wanli
Xu, Dong
Chen, Li
Gao, Zhiyong
[J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2020, 29 : 1725 - 1737
[10] Spatio-Temporal Detail Information Retrieval for Compressed Video Quality Enhancement
Luo, Dengyan
Ye, Mao
Li, Shuai
Zhu, Ce
Li, Xue
[J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 6808 - 6820

← 1 2 3 →