Multi-Swin Transformer Based Spatio-Temporal Information Exploration for Compressed Video Quality Enhancement

被引:0
作者
Yu, Li [1 ,2 ]
Wu, Shiyu [3 ]
Gabbouj, Moncef [4 ]
机构
[1] Nanjing Univ Informat Sci & Technol, Sch Comp Sci, Nanjing 211544, Peoples R China
[2] Nanjing Univ Informat Sci & Technol, Jiangsu Collaborat Innovat Ctr Atmospher Environm, Nanjing 211544, Peoples R China
[3] Nanjing Univ Informat Sci & Technol, Sch Software, Nanjing 211544, Peoples R China
[4] Tampere Univ, Dept Comp Sci, Tampere 33100, Finland
基金
中国国家自然科学基金;
关键词
Transformers; Convolution; Video recording; Quality assessment; Motion compensation; Feature extraction; Correlation; Compressed video quality enhancement; spatio-temporal information; swin transformer;
D O I
10.1109/LSP.2024.3429008
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Spatio-temporal information plays an important role in compressed video quality enhancement. Most advanced studies use deformable convolution or Swin transformer to explore spatio-temporal information. However, deformable convolution based methods may incur inaccurate motion compensation due to the compression artifacts and limited receptive fields. The Swin transformer based approaches are unable to fully explore the spatio-temporal information, limited by its rigid window-based mechanism. To solve the above problems, we propose a novel multi-Swin transformer-based network for compressed video quality enhancement to better explore spatio-temporal information. The whole workflow consists of the Local Alignment (LA) Module, the Global Refinement Fusion (GRF) Module, and the Quality Enhancement (QE) Module. The LA module roughly perceives the local motion through the deformable fusion. Subsequently, the GRF module employs the proposed multi-Swin transformer to enhance the spatio-temporal perception. Finally, the QE module effectively restores the texture details across various scales. Extensive experimental results prove the effectiveness of the proposed method.
引用
收藏
页码:1880 / 1884
页数:5
相关论文
共 26 条
  • [1] Study of Temporal Effects on Subjective Video Quality of Experience
    Bampis, Christos George
    Li, Zhi
    Moorthy, Anush Krishna
    Katsavounidis, Ioannis
    Aaron, Anne
    Bovik, Alan Conrad
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2017, 26 (11) : 5217 - 5231
  • [2] Overview of the Versatile Video Coding (VVC) Standard and its Applications
    Bross, Benjamin
    Wang, Ye-Kui
    Ye, Yan
    Liu, Shan
    Chen, Jianle
    Sullivan, Gary J.
    Ohm, Jens-Rainer
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2021, 31 (10) : 3736 - 3764
  • [3] CHARBONNIER P, 1994, IEEE IMAGE PROC, P168
  • [4] Deng JN, 2020, AAAI CONF ARTIF INTE, V34, P10696
  • [5] Dewang Hou, 2021, 2021 3rd International Conference on Advances in Computer Technology, Information Science and Communication (CTISC), P232, DOI 10.1109/CTISC52352.2021.00050
  • [6] MFQE 2.0: A New Approach for Multi-Frame Quality Enhancement on Compressed Video
    Guan, Zhenyu
    Xing, Qunliang
    Xu, Mai
    Yang, Ren
    Liu, Tie
    Wang, Zulin
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2021, 43 (03) : 949 - 963
  • [7] Kingma D.P., 2014, arXiv, DOI 10.48550/arXiv.1412.6980
  • [8] Swin Transformer: Hierarchical Vision Transformer using Shifted Windows
    Liu, Ze
    Lin, Yutong
    Cao, Yue
    Hu, Han
    Wei, Yixuan
    Zhang, Zheng
    Lin, Stephen
    Guo, Baining
    [J]. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 9992 - 10002
  • [9] Deep Non-Local Kalman Network for Video Compression Artifact Reduction
    Lu, Guo
    Zhang, Xiaoyun
    Ouyang, Wanli
    Xu, Dong
    Chen, Li
    Gao, Zhiyong
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2020, 29 : 1725 - 1737
  • [10] Spatio-Temporal Detail Information Retrieval for Compressed Video Quality Enhancement
    Luo, Dengyan
    Ye, Mao
    Li, Shuai
    Zhu, Ce
    Li, Xue
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 6808 - 6820