A compressed video quality enhancement algorithm based on CNN and transformer hybrid network

被引:1
作者
Li, Hao [1 ]
He, Xiaohai [1 ]
Xiong, Shuhua [1 ]
He, Haibo [2 ]
Chen, Honggang [1 ]
机构
[1] Sichuan Univ, Sch Elect Informat, Chengdu 610000, Sichuan, Peoples R China
[2] Chengdu Xitu Technol Co Ltd Org, Chengdu 610000, Sichuan, Peoples R China
基金
中国国家自然科学基金;
关键词
Quality enhancement; Compressed video; Transformer; Deep learning;
D O I
10.1007/s11227-024-06654-0
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Convolutional neural network (CNN)-based algorithms perform well in enhancing video quality by removing artifacts in compressed videos. Existing state-of-the-art approaches primarily concentrate on leveraging the spatiotemporal details from neighboring frames through deformable convolution. Nonetheless, the training of offset fields in deformable convolution poses significant challenges, as their instability during training frequently results in offset overflow, which reduces the efficiency of correlation modeling. On the other hand, convolution alone proves insufficient for effectively modeling long-range dependencies. We introduce a CNN and transformer-based compressed video quality enhancement (CTVE) method, which comprises three essential modules: the feature initial processing (FIP) module, the feature further processing (FFP) module, and the reconstruction module. The FIP module is built upon the deformable convolution (DCN), enabling it to initially extract spatiotemporal information from neighboring frames. The FFP module is based on Swinv2-transformer, which can accurately model the relevant contextual information and adapt well to image content. Extensive experimentation conducted on JCT-VT test sequences demonstrates that our method achieves outstanding average performance in both subjective and objective quality assessments.
引用
收藏
页数:21
相关论文
共 54 条
[1]  
Agarap A. F., 2018, Deep learning using rectified linear units (relu)
[2]   Review of deep learning: concepts, CNN architectures, challenges, applications, future directions [J].
Alzubaidi, Laith ;
Zhang, Jinglan ;
Humaidi, Amjad J. ;
Al-Dujaili, Ayad ;
Duan, Ye ;
Al-Shamma, Omran ;
Santamaria, J. ;
Fadhel, Mohammed A. ;
Al-Amidie, Muthana ;
Farhan, Laith .
JOURNAL OF BIG DATA, 2021, 8 (01)
[3]  
Bossen Frank, 2013, JCTVC-L1100, V12, P1
[4]   Overview of the Versatile Video Coding (VVC) Standard and its Applications [J].
Bross, Benjamin ;
Wang, Ye-Kui ;
Ye, Yan ;
Liu, Shan ;
Chen, Jianle ;
Sullivan, Gary J. ;
Ohm, Jens-Rainer .
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2021, 31 (10) :3736-3764
[5]   Real-Time Video Super-Resolution with Spatio-Temporal Networks and Motion Compensation [J].
Caballero, Jose ;
Ledig, Christian ;
Aitken, Andrew ;
Acosta, Alejandro ;
Totz, Johannes ;
Wang, Zehan ;
Shi, Wenzhe .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :2848-2857
[6]   VDTR: Video Deblurring With Transformer [J].
Cao, Mingdeng ;
Fan, Yanbo ;
Zhang, Yong ;
Wang, Jue ;
Yang, Yujiu .
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (01) :160-171
[7]   DPW-SDNet: Dual Pixel-Wavelet Domain Deep CNNs for Soft Decoding of JPEG-Compressed Images [J].
Chen, Honggang ;
He, Xiaohai ;
Qing, Linbo ;
Xiong, Shuhua ;
Nguyen, Truong Q. .
PROCEEDINGS 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW), 2018, :824-833
[8]   Activating More Pixels in Image Super-Resolution Transformer [J].
Chen, Xiangyu ;
Wang, Xintao ;
Zhou, Jiantao ;
Qiao, Yu ;
Dong, Chao .
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, :22367-22377
[9]  
Conde Marcos V., 2023, Computer Vision - ECCV 2022 Workshops: Proceedings. Lecture Notes in Computer Science (13802), P669, DOI 10.1007/978-3-031-25063-7_42
[10]   Deformable Convolutional Networks [J].
Dai, Jifeng ;
Qi, Haozhi ;
Xiong, Yuwen ;
Li, Yi ;
Zhang, Guodong ;
Hu, Han ;
Wei, Yichen .
2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :764-773