Event-Based Monocular Depth Estimation With Recurrent Transformers

被引:5
|
作者
Liu, Xu [1 ,2 ]
Li, Jianing [3 ]
Shi, Jinqiao [4 ]
Fan, Xiaopeng [1 ,2 ]
Tian, Yonghong [2 ,3 ]
Zhao, Debin [1 ,2 ]
机构
[1] Harbin Inst Technol, Res Ctr Intelligent Interface & Human Comp Interac, Dept Comp Sci & Technol, Harbin 150001, Peoples R China
[2] Peng Cheng Lab, Shenzhen 518000, Peoples R China
[3] Peking Univ, Sch Comp Sci, Beijing 100871, Peoples R China
[4] Beijing Univ Posts & Telecommun, Sch Cyberspace Secur, Beijing 100871, Peoples R China
基金
中国国家自然科学基金;
关键词
Transformers; Estimation; Cameras; Voltage control; Task analysis; Streaming media; Circuits and systems; Event camera; monocular depth estimator; recurrent transformer; cross attention; VISION;
D O I
10.1109/TCSVT.2024.3378742
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Event cameras, offering high temporal resolutions and high dynamic ranges, have brought a new perspective to address common challenges in monocular depth estimation (e.g., motion blur and low light). However, existing CNN-based methods insufficiently exploit global spatial information from asynchronous events, while RNN-based methods show a limited capacity for effective temporal cues utilization for event-based monocular depth estimation. To this end, we propose a event-based monocular depth estimator with recurrent transformers, namely EReFormer. Technically, we first design a transformer-based encoder-decoder that utilizes multi-scale features to model global spatial information from events. Then, we propose a Gate Recurrent Vision Transformer (GRViT), introducing a recursive mechanism into transformers, to leverage rich temporal cues from events. Finally, we present a Cross Attention-guided Skip Connection (CASC), performing cross attention to fuse multi-scale features, to improve global spatial modeling capabilities. The experimental results show that our EReFormer outperforms state-of-the-art methods by a margin on both synthetic and real-world datasets. Our open-source code is available at https://github.com/liuxu0303/EReFormer.
引用
收藏
页码:7417 / 7429
页数:13
相关论文
共 50 条
  • [31] On the Benefits of Visual Stabilization for Frame- and Event-Based Perception
    Rodriguez-Gomez, J. P.
    Martinez-de Dios, J. R.
    Ollero, A.
    Gallego, G.
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2024, 9 (10): : 8802 - 8809
  • [32] Secrets of Event-Based Optical Flow, Depth and Ego-Motion Estimation by Contrast Maximization
    Shiba, Shintaro
    Klose, Yannick
    Aoki, Yoshimitsu
    Gallego, Guillermo
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (12) : 7742 - 7759
  • [33] Toward Event-Based State Estimation for Neuromorphic Event Cameras
    Liu, Xinhui
    Cheng, Meiqi
    Shi, Dawei
    Shi, Ling
    IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2023, 68 (07) : 4281 - 4288
  • [34] EventASEG: An Event-Based Asynchronous Segmentation of Road With Likelihood Attention
    Annamalai, Lakshmi
    Thakur, Chetan Singh
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2024, 9 (08): : 6951 - 6958
  • [35] Event-based Continuous Optical Flow Estimation
    Fu J.-Y.
    Yu L.
    Yang W.
    Lu X.
    Zidonghua Xuebao/Acta Automatica Sinica, 2023, 49 (09): : 1845 - 1856
  • [36] LW-Net: A Lightweight Network for Monocular Depth Estimation
    Feng, Cheng
    Zhang, Congxuan
    Chen, Zhen
    Li, Ming
    Chen, Hao
    Fan, Bingbing
    IEEE ACCESS, 2020, 8 : 196287 - 196298
  • [37] METER: A Mobile Vision Transformer Architecture for Monocular Depth Estimation
    Papa, Lorenzo
    Russo, Paolo
    Amerini, Irene
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (10) : 5882 - 5893
  • [38] Self-Supervised Monocular Depth Estimation Based on High-Order Spatial Interactions
    Wang, Xiuling
    Yu, Minglin
    Wang, Haixia
    Lu, Xiao
    Zhang, Zhiguo
    IEEE SENSORS JOURNAL, 2024, 24 (04) : 4978 - 4991
  • [39] Leveraging Contextual Information for Monocular Depth Estimation
    Kim, Doyeon
    Lee, Sihaeng
    Lee, Janghyeon
    Kim, Junmo
    IEEE ACCESS, 2020, 8 : 147808 - 147817
  • [40] Unsupervised Deep Event Stereo for Depth Estimation
    Uddin, S. M. Nadim
    Ahmed, Soikat Hasan
    Jung, Yong Ju
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (11) : 7489 - 7504