Multi-modal degradation feature learning for unified image restoration based on contrastive learning

被引：0

作者：

Chen, Lei ^{[1
]}

Xiong, Qingbo ^{[1
]}

Zhang, Wei ^{[1
,2
]}

Liang, Xiaoli ^{[1
]}

Gan, Zhihua ^{[1
]}

Li, Liqiang ^{[3
]}

He, Xin ^{[1
]}

机构：

[1] Henan Univ, Sch Software, Jinming Rd, Kaifeng 475004, Peoples R China

[2] China Univ Labor Relat, Sch Appl Technol, Zengguang Rd, Beijing 100048, Peoples R China

[3] Shangqiu Normal Univ, Sch Phys, Shangqiu 476000, Peoples R China

来源：

NEUROCOMPUTING | 2025年 / 616卷

基金：

美国国家科学基金会;

关键词：

Unified image restoration; Multi-modal features; Contrastive learning; Deep learning;

D O I：

10.1016/j.neucom.2024.128955

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In this paper, we address the unified image restoration challenge by reframing it as a contrastive learning- based classification problem. Despite the significant strides made by deep learning methods in enhancing image restoration quality, their limited capacity to generalize across diverse degradation types and intensities necessitates the training of separate models for each specific degradation scenario. We proposes an all- encompassing approach that can restore images from various unknown corruption types and levels. We devise a method that learns representations of the latent sharp image's degradation and accompanying textual features (such as dataset categories and image content descriptions), converting these into prompts which are then embedded within a reconstruction network model to enhance cross-database restoration performance. This culminates in a unified image reconstruction framework. The study involves two stages: In the first stage, we design a MultiContentNet that learns multi-modal features (MMFs) of the latent sharp image. This network encodes the visual degradation expressions and contextual text features into latent variables, thereby exerting a guided classification effect. Specifically, MultiContentNet is trained as an auxiliary controller capable of taking the degraded input image and, through contrastive learning, extracts MMFs of the latent target image. This effectively generates natural classifiers tailored for different degradation types. The second phase integrates the learned MMFs into an image restoration network via cross-attention mechanisms. This guides the restoration model to learn high-fidelity image recovery. Experiments conducted on six blind image restoration tasks demonstrate that the proposed method achieves state-of-the-art performance, highlighting the potential significance of large-scale pretrained vision-language models' MMFs in advancing high-quality unified image reconstruction.

引用

页数：11

共 50 条

[31] Multi-modal Network Representation Learning
Zhang, Chuxu
Jiang, Meng
Zhang, Xiangliang
Ye, Yanfang
Chawla, Nitesh, V
KDD '20: PROCEEDINGS OF THE 26TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2020, : 3557 - 3558
[32] CIRF: Coupled Image Reconstruction and Fusion Strategy for Deep Learning Based Multi-Modal Image Fusion
Zheng, Junze
Xiao, Junyan
Wang, Yaowei
Zhang, Xuming
SENSORS, 2024, 24 (11)
[33] Multi-Modal Object Tracking and Image Fusion With Unsupervised Deep Learning
LaHaye, Nicholas
Ott, Jordan
Garay, Michael J.
El-Askary, Hesham Mohamed
Linstead, Erik
IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2019, 12 (08) : 3056 - 3066
[34] A Cross-modal image retrieval method based on contrastive learning
Zhou, Wen
JOURNAL OF OPTICS-INDIA, 2024, 53 (03): : 2098 - 2107
[35] TRANSFORMER-BASED MULTI-MODAL LEARNING FOR MULTI-LABEL REMOTE SENSING IMAGE CLASSIFICATION
Hoffmann, David Sebastian
Clasen, Kai Norman
Demir, Begum
IGARSS 2023 - 2023 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, 2023, : 4891 - 4894
[36] Multi-modal feature fusion for geographic image annotation
Li, Ke
Zou, Changqing
Bu, Shuhui
Liang, Yun
Zhang, Jian
Gong, Minglun
PATTERN RECOGNITION, 2018, 73 : 1 - 14
[37] RGB-D Scene Classification via Multi-modal Feature Learning
Ziyun Cai
Ling Shao
Cognitive Computation, 2019, 11 : 825 - 840
[38] Towards Multi-modal Anatomical Landmark Detection for Ultrasound-Guided Brain Tumor Resection with Contrastive Learning
Salari, Soorena
Rasoulian, Amirhossein
Rivaz, Hassan
Xiao, Yiming
MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION, MICCAI 2023, PT IX, 2023, 14228 : 668 - 678
[39] RGB-D Scene Classification via Multi-modal Feature Learning
Cai, Ziyun
Shao, Ling
COGNITIVE COMPUTATION, 2019, 11 (06) : 825 - 840
[40] CCGN: consistency contrastive-learning graph network for multi-modal fake news detection
Cui, Shaodong
Duan, Kaibo
Ma, Wen
Shinnou, Hiroyuki
MULTIMEDIA SYSTEMS, 2025, 31 (02)

← 1 2 3 4 5 →