Unsupervised Low-Light Video Enhancement With Spatial-Temporal Co-Attention Transformer

被引：3

作者：

Lv, Xiaoqian ^{[1
]}

Zhang, Shengping ^{[1
]}

Wang, Chenyang ^{[1
]}

Zhang, Weigang ^{[1
]}

Yao, Hongxun ^{[2
]}

Huang, Qingming ^{[3
]}

机构：

[1] Harbin Inst Technol, Sch Comp Sci & Technol, Weihai 264209, Peoples R China

[2] Harbin Inst Technol, Sch Comp Sci & Technol, Harbin 150001, Peoples R China

[3] Univ Chinese Acad Sci, Sch Comp Sci & Technol, Beijing 100190, Peoples R China

来源：

IEEE TRANSACTIONS ON IMAGE PROCESSING | 2023年 / 32卷

基金：

中国国家自然科学基金;

关键词：

Low-light video enhancement; unsupervised learning; curve estimation; transformer; IMAGE QUALITY ASSESSMENT; REPRESENTATION; FRAMEWORK; ALGORITHM;

D O I：

10.1109/TIP.2023.3301332

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Existing low-light video enhancement methods are dominated by Convolution Neural Networks (CNNs) that are trained in a supervised manner. Due to the difficulty of collecting paired dynamic low/normal-light videos in real-world scenes, they are usually trained on synthetic, static, and uniform motion videos, which undermines their generalization to real-world scenes. Additionally, these methods typically suffer from temporal inconsistency (e.g., flickering artifacts and motion blurs) when handling large-scale motions since the local perception property of CNNs limits them to model long-range dependencies in both spatial and temporal domains. To address these problems, we propose the first unsupervised method for low-light video enhancement to our best knowledge, named LightenFormer, which models long-range intra- and inter-frame dependencies with a spatial-temporal co-attention transformer to enhance brightness while maintaining temporal consistency. Specifically, an effective but lightweight S-curve Estimation Network (SCENet) is first proposed to estimate pixel-wise S-shaped non-linear curves (S-curves) to adaptively adjust the dynamic range of an input video. Next, to model the temporal consistency of the video, we present a Spatial-Temporal Refinement Network (STRNet) to refine the enhanced video. The core module of STRNet is a novel Spatial-Temporal Co-attention Transformer (STCAT), which exploits multi-scale self- and cross-attention interactions to capture long-range correlations in both spatial and temporal domains among frames for implicit motion estimation. To achieve unsupervised training, we further propose two non-reference loss functions based on the invertibility of the S-curve and the noise independence among frames. Extensive experiments on the SDSD and LLIV-Phone datasets demonstrate that our LightenFormer outperforms state-of-the-art methods.

引用

页码：4701 / 4715

页数：15

共 50 条

[1] DSFormer: Leveraging Transformer with Cross-Modal Attention for Temporal Consistency in Low-Light Video Enhancement
Xu, JiaHao
Mei, ShuHao
Chen, ZiZheng
Zhang, DanNi
Shi, Fan
Zhao, Meng
ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT XI, ICIC 2024, 2024, 14872 : 27 - 38
[2] Temporal-Spatial Filtering for Enhancement of Low-Light Surveillance Video
Guo, Fan
Tang, Jin
Peng, Hui
Zou, Beiji
JOURNAL OF ADVANCED COMPUTATIONAL INTELLIGENCE AND INTELLIGENT INFORMATICS, 2016, 20 (04) : 652 - 661
[3] Temporally Consistent Enhancement of Low-Light Videos via Spatial-Temporal Compatible Learning
Zhu, Lingyu
Yang, Wenhan
Chen, Baoliang
Zhu, Hanwei
Meng, Xiandong
Wang, Shiqi
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2024, 132 (10) : 4703 - 4723
[4] Adaptive Locally-Aligned Transformer for low-light video enhancement
Cao, Yiwen
Su, Yukun
Deng, Jingliang
Zhang, Yu
Wu, Qingyao
COMPUTER VISION AND IMAGE UNDERSTANDING, 2024, 240
[5] Video Description with Spatial-Temporal Attention
Tu, Yunbin
Zhang, Xishan
Liu, Bingtao
Yan, Chenggang
PROCEEDINGS OF THE 2017 ACM MULTIMEDIA CONFERENCE (MM'17), 2017, : 1014 - 1022
[6] Scene Retrieval in Soccer Videos by Spatial-temporal Attention with Video Vision Transformer
Gan, Yaozong
Togo, Ren
Ogawa, Takahiro
Haseyama, Mild
2022 IEEE INTERNATIONAL CONFERENCE ON CONSUMER ELECTRONICS - TAIWAN, IEEE ICCE-TW 2022, 2022, : 453 - 454
[7] Collaborative spatial-temporal video salient object detection with cross attention transformer
Su, Yuting
Wang, Weikang
Liu, Jing
Jing, Peiguang
SIGNAL PROCESSING, 2024, 224
[8] Spatial-Temporal Sequence Attention Based Efficient Transformer for Video Snow Removal
Gao, Tao
Zhang, Qianxi
Chen, Ting
Wen, Yuanbo
BIG DATA MINING AND ANALYTICS, 2025, 8 (03): : 551 - 562
[9] MAGAN:Unsupervised Low-Light Image Enhancement Guided by Mixed-Attention
Renjun Wang
Bin Jiang
Chao Yang
Qiao Li
Bolin Zhang
Big Data Mining and Analytics, 2022, 5 (02) : 110 - 119
[10] MAGAN: Unsupervised Low-Light Image Enhancement Guided by Mixed-Attention
Wang, Renjun
Jiang, Bin
Yang, Chao
Li, Qiao
Zhang, Bolin
BIG DATA MINING AND ANALYTICS, 2022, 5 (02) : 110 - 119

← 1 2 3 4 5 →