Triplane-Smoothed Video Dehazing with CLIP-Enhanced Generalization

被引:0
|
作者
Ren, Jingjing [1 ]
Chen, Haoyu [1 ]
Ye, Tian [1 ]
Wu, Hongtao [1 ]
Zhu, Lei [1 ,2 ]
机构
[1] Hong Kong Univ Sci & Technol Guangzhou, Robot & Autonomous Syst Thrust, Guangzhou 511400, Guangdong, Peoples R China
[2] Hong Kong Univ Sci & Technol, Dept Elect & Comp Engn, Hong Kong, Peoples R China
基金
中国国家自然科学基金;
关键词
Video dehazing; Spatial-temporal consistency; Triplane; CLIP; Generalization; IMAGE;
D O I
10.1007/s11263-024-02161-0
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Video dehazing is a critical research area in computer vision that aims to enhance the quality of hazy frames, which benefits many downstream tasks, e.g. semantic segmentation. Recent work devise CNN-based structure or attention mechanism to fuse temporal information, while some others utilize offset between frames to align frames explicitly. Another significant line of video dehazing research focuses on constructing paired datasets by synthesizing foggy effect on clear video or generating real haze effect on indoor scenes. Despite the significant contributions of these dehazing networks and datasets to the advancement of video dehazing, current methods still suffer from spatial-temporal inconsistency and poor generalization ability. We address the aforementioned issues by proposing a triplane smoothing module to explicitly benefit from spatial-temporal smooth prior of the input video and generate temporally coherent dehazing results. We further devise a query base decoder to extract haze-relevant information while also aggregate temporal clues implicitly. To increase the generalization ability of our dehazing model we utilize CLIP guidance with a rich and high-level understanding of hazy effect. We conduct extensive experiments to verify the effectiveness of our model to generate spatial-temporally consistent dehazing results and produce pleasing dehazing results of real-world data.
引用
收藏
页码:475 / 488
页数:14
相关论文
共 6 条
  • [1] Exploring a CLIP-Enhanced Automated Approach for Video Description Generation
    Zhang, Siang-Ling
    Cheng, Huai-Hsun
    Chen, Yen-Hsin
    Yeh, Mei-Chen
    2023 ASIA PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE, APSIPA ASC, 2023, : 1506 - 1511
  • [2] CLIP-enhanced multimodal machine translation: integrating visual and label features with transformer fusion
    ShaoDong Cui
    Xinyan Yin
    Kaibo Duan
    Hiroyuki Shinnou
    Multimedia Tools and Applications, 2025, 84 (14) : 12699 - 12713
  • [3] VCLIPSeg: Voxel-Wise CLIP-Enhanced Model for Semi-supervised Medical Image Segmentation
    Li, Lei
    Lian, Sheng
    Luo, Zhiming
    Wang, Beizhan
    Li, Shaozi
    MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2024, PT IX, 2024, 15009 : 692 - 701
  • [4] ST-CLIP: Spatio-Temporal Enhanced CLIP Towards Dense Video Captioning
    Chen, Huimin
    Duan, Pengfei
    Huang, Mingru
    Guo, Jingyi
    Xiong, Shengwu
    ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT XI, ICIC 2024, 2024, 14872 : 396 - 407
  • [5] Video and image quality enhancement using an enhanced lower bound on transmission map dehazing technique
    Ayoub, Abeer
    El-Shafai, Walid
    Abd El-Samie, Fathi E.
    Hamad, Ehab K. I.
    EL-Rabaie, S.
    MULTIMEDIA SYSTEMS, 2025, 31 (02)
  • [6] High quality dehazed image and video based on enhanced multi-scale guided filtering dehazing technique
    Abeer Ayoub
    Walid El-Shafai
    Fathi E. Abd El-Samie
    Ehab K. I. Hamad
    El-Sayed M. El-Rabaie
    Cluster Computing, 2025, 28 (5)