Event-Based Monocular Depth Estimation With Recurrent Transformers

被引：5

作者：

Liu, Xu ^{[1
,2
]}

Li, Jianing ^{[3
]}

Shi, Jinqiao ^{[4
]}

Fan, Xiaopeng ^{[1
,2
]}

Tian, Yonghong ^{[2
,3
]}

Zhao, Debin ^{[1
,2
]}

机构：

[1] Harbin Inst Technol, Res Ctr Intelligent Interface & Human Comp Interac, Dept Comp Sci & Technol, Harbin 150001, Peoples R China

[2] Peng Cheng Lab, Shenzhen 518000, Peoples R China

[3] Peking Univ, Sch Comp Sci, Beijing 100871, Peoples R China

[4] Beijing Univ Posts & Telecommun, Sch Cyberspace Secur, Beijing 100871, Peoples R China

来源：

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY | 2024年 / 34卷 / 08期

基金：

中国国家自然科学基金;

关键词：

Transformers; Estimation; Cameras; Voltage control; Task analysis; Streaming media; Circuits and systems; Event camera; monocular depth estimator; recurrent transformer; cross attention; VISION;

D O I：

10.1109/TCSVT.2024.3378742

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Event cameras, offering high temporal resolutions and high dynamic ranges, have brought a new perspective to address common challenges in monocular depth estimation (e.g., motion blur and low light). However, existing CNN-based methods insufficiently exploit global spatial information from asynchronous events, while RNN-based methods show a limited capacity for effective temporal cues utilization for event-based monocular depth estimation. To this end, we propose a event-based monocular depth estimator with recurrent transformers, namely EReFormer. Technically, we first design a transformer-based encoder-decoder that utilizes multi-scale features to model global spatial information from events. Then, we propose a Gate Recurrent Vision Transformer (GRViT), introducing a recursive mechanism into transformers, to leverage rich temporal cues from events. Finally, we present a Cross Attention-guided Skip Connection (CASC), performing cross attention to fuse multi-scale features, to improve global spatial modeling capabilities. The experimental results show that our EReFormer outperforms state-of-the-art methods by a margin on both synthetic and real-world datasets. Our open-source code is available at https://github.com/liuxu0303/EReFormer.

引用

页码：7417 / 7429

页数：13

共 50 条

[21] Adversarial Patch Attacks on Monocular Depth Estimation Networks
Yamanaka, Koichiro
Matsumoto, Ryutaroh
Takahashi, Keita
Fujii, Toshiaki
IEEE ACCESS, 2020, 8 : 179094 - 179104
[22] SAAM: Stealthy Adversarial Attack on Monocular Depth Estimation
Guesmi, Amira
Hanif, Muhammad Abdullah
Ouni, Bassem
Shafique, Muhammad
IEEE ACCESS, 2024, 12 : 13571 - 13585
[23] Monocular Depth Estimation With Augmented Ordinal Depth Relationships
Cao, Yuanzhouhan
Zhao, Tianqi
Xian, Ke
Shen, Chunhua
Cao, Zhiguo
Xu, Shugong
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2020, 30 (08) : 2674 - 2682
[24] Fast Event-Based Optical Flow Estimation by Triplet Matching
Shiba, Shintaro
Aoki, Yoshimitsu
Gallego, Guillermo
IEEE SIGNAL PROCESSING LETTERS, 2022, 29 : 2712 - 2716
[25] Event Anonymization: Privacy-Preserving Person Re-Identification and Pose Estimation in Event-Based Vision
Ahmad, Shafiq
Morerio, Pietro
Del Bue, Alessio
IEEE ACCESS, 2024, 12 : 66964 - 66980
[26] Event-based video reconstruction via attention-based recurrent network
Ma, Wenwen
Ma, Shanxing
Meiresone, Pieter
Allebosch, Gianni
Philips, Wilfried
Aelterman, Jan
NEUROCOMPUTING, 2025, 632
[27] Unsupervised Monocular Depth Estimation for Monocular Visual SLAM Systems
Liu, Feng
Huang, Ming
Ge, Hongyu
Tao, Dan
Gao, Ruipeng
IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2024, 73 : 1 - 13
[28] Robust Event-Based Vision Model Estimation by Dispersion Minimisation
Nunes, Urbano Miguel
Demiris, Yiannis
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (12) : 9561 - 9573
[29] Two-stage cross-fusion network for stereo event-based depth estimation
Ghosh, Dipon Kumar
Jung, Yong Ju
EXPERT SYSTEMS WITH APPLICATIONS, 2024, 241
[30] Self-supervised Monocular Pose and Depth Estimation for Wireless Capsule Endoscopy with Transformers
Nazifi, Nahid
Araujo, Helder
Erabati, Gopi Krishna
Tahri, Omar
IMAGE-GUIDED PROCEDURES, ROBOTIC INTERVENTIONS, AND MODELING, MEDICAL IMAGING 2024, 2024, 12928

← 1 2 3 4 5 →