Triplane-Smoothed Video Dehazing with CLIP-Enhanced Generalization

被引：0

作者：

Ren, Jingjing ^{[1
]}

Chen, Haoyu ^{[1
]}

Ye, Tian ^{[1
]}

Wu, Hongtao ^{[1
]}

Zhu, Lei ^{[1
,2
]}

机构：

[1] Hong Kong Univ Sci & Technol Guangzhou, Robot & Autonomous Syst Thrust, Guangzhou 511400, Guangdong, Peoples R China

[2] Hong Kong Univ Sci & Technol, Dept Elect & Comp Engn, Hong Kong, Peoples R China

来源：

INTERNATIONAL JOURNAL OF COMPUTER VISION | 2025年 / 133卷 / 01期

基金：

中国国家自然科学基金;

关键词：

Video dehazing; Spatial-temporal consistency; Triplane; CLIP; Generalization; IMAGE;

D O I：

10.1007/s11263-024-02161-0

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Video dehazing is a critical research area in computer vision that aims to enhance the quality of hazy frames, which benefits many downstream tasks, e.g. semantic segmentation. Recent work devise CNN-based structure or attention mechanism to fuse temporal information, while some others utilize offset between frames to align frames explicitly. Another significant line of video dehazing research focuses on constructing paired datasets by synthesizing foggy effect on clear video or generating real haze effect on indoor scenes. Despite the significant contributions of these dehazing networks and datasets to the advancement of video dehazing, current methods still suffer from spatial-temporal inconsistency and poor generalization ability. We address the aforementioned issues by proposing a triplane smoothing module to explicitly benefit from spatial-temporal smooth prior of the input video and generate temporally coherent dehazing results. We further devise a query base decoder to extract haze-relevant information while also aggregate temporal clues implicitly. To increase the generalization ability of our dehazing model we utilize CLIP guidance with a rich and high-level understanding of hazy effect. We conduct extensive experiments to verify the effectiveness of our model to generate spatial-temporally consistent dehazing results and produce pleasing dehazing results of real-world data.

引用

页码：475 / 488

页数：14

共 6 条

[1] Exploring a CLIP-Enhanced Automated Approach for Video Description Generation
Zhang, Siang-Ling
Cheng, Huai-Hsun
Chen, Yen-Hsin
Yeh, Mei-Chen
2023 ASIA PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE, APSIPA ASC, 2023, : 1506 - 1511
[2] CLIP-enhanced multimodal machine translation: integrating visual and label features with transformer fusion
ShaoDong Cui
Xinyan Yin
Kaibo Duan
Hiroyuki Shinnou
Multimedia Tools and Applications, 2025, 84 (14) : 12699 - 12713
[3] VCLIPSeg: Voxel-Wise CLIP-Enhanced Model for Semi-supervised Medical Image Segmentation
Li, Lei
Lian, Sheng
Luo, Zhiming
Wang, Beizhan
Li, Shaozi
MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2024, PT IX, 2024, 15009 : 692 - 701
[4] ST-CLIP: Spatio-Temporal Enhanced CLIP Towards Dense Video Captioning
Chen, Huimin
Duan, Pengfei
Huang, Mingru
Guo, Jingyi
Xiong, Shengwu
ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT XI, ICIC 2024, 2024, 14872 : 396 - 407
[5] Video and image quality enhancement using an enhanced lower bound on transmission map dehazing technique
Ayoub, Abeer
El-Shafai, Walid
Abd El-Samie, Fathi E.
Hamad, Ehab K. I.
EL-Rabaie, S.
MULTIMEDIA SYSTEMS, 2025, 31 (02)
[6] High quality dehazed image and video based on enhanced multi-scale guided filtering dehazing technique
Abeer Ayoub
Walid El-Shafai
Fathi E. Abd El-Samie
Ehab K. I. Hamad
El-Sayed M. El-Rabaie
Cluster Computing, 2025, 28 (5)

← 1 →