Event-Based Monocular Depth Estimation With Recurrent Transformers

被引：5

作者：

Liu, Xu ^{[1
,2
]}

Li, Jianing ^{[3
]}

Shi, Jinqiao ^{[4
]}

Fan, Xiaopeng ^{[1
,2
]}

Tian, Yonghong ^{[2
,3
]}

Zhao, Debin ^{[1
,2
]}

机构：

[1] Harbin Inst Technol, Res Ctr Intelligent Interface & Human Comp Interac, Dept Comp Sci & Technol, Harbin 150001, Peoples R China

[2] Peng Cheng Lab, Shenzhen 518000, Peoples R China

[3] Peking Univ, Sch Comp Sci, Beijing 100871, Peoples R China

[4] Beijing Univ Posts & Telecommun, Sch Cyberspace Secur, Beijing 100871, Peoples R China

来源：

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY | 2024年 / 34卷 / 08期

基金：

中国国家自然科学基金;

关键词：

Transformers; Estimation; Cameras; Voltage control; Task analysis; Streaming media; Circuits and systems; Event camera; monocular depth estimator; recurrent transformer; cross attention; VISION;

D O I：

10.1109/TCSVT.2024.3378742

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Event cameras, offering high temporal resolutions and high dynamic ranges, have brought a new perspective to address common challenges in monocular depth estimation (e.g., motion blur and low light). However, existing CNN-based methods insufficiently exploit global spatial information from asynchronous events, while RNN-based methods show a limited capacity for effective temporal cues utilization for event-based monocular depth estimation. To this end, we propose a event-based monocular depth estimator with recurrent transformers, namely EReFormer. Technically, we first design a transformer-based encoder-decoder that utilizes multi-scale features to model global spatial information from events. Then, we propose a Gate Recurrent Vision Transformer (GRViT), introducing a recursive mechanism into transformers, to leverage rich temporal cues from events. Finally, we present a Cross Attention-guided Skip Connection (CASC), performing cross attention to fuse multi-scale features, to improve global spatial modeling capabilities. The experimental results show that our EReFormer outperforms state-of-the-art methods by a margin on both synthetic and real-world datasets. Our open-source code is available at https://github.com/liuxu0303/EReFormer.

引用

页码：7417 / 7429

页数：13

共 50 条

[31] On the Benefits of Visual Stabilization for Frame- and Event-Based Perception
Rodriguez-Gomez, J. P.
Martinez-de Dios, J. R.
Ollero, A.
Gallego, G.
IEEE ROBOTICS AND AUTOMATION LETTERS, 2024, 9 (10): : 8802 - 8809
[32] Secrets of Event-Based Optical Flow, Depth and Ego-Motion Estimation by Contrast Maximization
Shiba, Shintaro
Klose, Yannick
Aoki, Yoshimitsu
Gallego, Guillermo
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (12) : 7742 - 7759
[33] Toward Event-Based State Estimation for Neuromorphic Event Cameras
Liu, Xinhui
Cheng, Meiqi
Shi, Dawei
Shi, Ling
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2023, 68 (07) : 4281 - 4288
[34] EventASEG: An Event-Based Asynchronous Segmentation of Road With Likelihood Attention
Annamalai, Lakshmi
Thakur, Chetan Singh
IEEE ROBOTICS AND AUTOMATION LETTERS, 2024, 9 (08): : 6951 - 6958
[35] Event-based Continuous Optical Flow Estimation
Fu J.-Y.
Yu L.
Yang W.
Lu X.
Zidonghua Xuebao/Acta Automatica Sinica, 2023, 49 (09): : 1845 - 1856
[36] LW-Net: A Lightweight Network for Monocular Depth Estimation
Feng, Cheng
Zhang, Congxuan
Chen, Zhen
Li, Ming
Chen, Hao
Fan, Bingbing
IEEE ACCESS, 2020, 8 : 196287 - 196298
[37] METER: A Mobile Vision Transformer Architecture for Monocular Depth Estimation
Papa, Lorenzo
Russo, Paolo
Amerini, Irene
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (10) : 5882 - 5893
[38] Self-Supervised Monocular Depth Estimation Based on High-Order Spatial Interactions
Wang, Xiuling
Yu, Minglin
Wang, Haixia
Lu, Xiao
Zhang, Zhiguo
IEEE SENSORS JOURNAL, 2024, 24 (04) : 4978 - 4991
[39] Leveraging Contextual Information for Monocular Depth Estimation
Kim, Doyeon
Lee, Sihaeng
Lee, Janghyeon
Kim, Junmo
IEEE ACCESS, 2020, 8 : 147808 - 147817
[40] Unsupervised Deep Event Stereo for Depth Estimation
Uddin, S. M. Nadim
Ahmed, Soikat Hasan
Jung, Yong Ju
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (11) : 7489 - 7504

← 1 2 3 4 5 →