Event-Based Monocular Depth Estimation With Recurrent Transformers

被引:5
|
作者
Liu, Xu [1 ,2 ]
Li, Jianing [3 ]
Shi, Jinqiao [4 ]
Fan, Xiaopeng [1 ,2 ]
Tian, Yonghong [2 ,3 ]
Zhao, Debin [1 ,2 ]
机构
[1] Harbin Inst Technol, Res Ctr Intelligent Interface & Human Comp Interac, Dept Comp Sci & Technol, Harbin 150001, Peoples R China
[2] Peng Cheng Lab, Shenzhen 518000, Peoples R China
[3] Peking Univ, Sch Comp Sci, Beijing 100871, Peoples R China
[4] Beijing Univ Posts & Telecommun, Sch Cyberspace Secur, Beijing 100871, Peoples R China
基金
中国国家自然科学基金;
关键词
Transformers; Estimation; Cameras; Voltage control; Task analysis; Streaming media; Circuits and systems; Event camera; monocular depth estimator; recurrent transformer; cross attention; VISION;
D O I
10.1109/TCSVT.2024.3378742
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Event cameras, offering high temporal resolutions and high dynamic ranges, have brought a new perspective to address common challenges in monocular depth estimation (e.g., motion blur and low light). However, existing CNN-based methods insufficiently exploit global spatial information from asynchronous events, while RNN-based methods show a limited capacity for effective temporal cues utilization for event-based monocular depth estimation. To this end, we propose a event-based monocular depth estimator with recurrent transformers, namely EReFormer. Technically, we first design a transformer-based encoder-decoder that utilizes multi-scale features to model global spatial information from events. Then, we propose a Gate Recurrent Vision Transformer (GRViT), introducing a recursive mechanism into transformers, to leverage rich temporal cues from events. Finally, we present a Cross Attention-guided Skip Connection (CASC), performing cross attention to fuse multi-scale features, to improve global spatial modeling capabilities. The experimental results show that our EReFormer outperforms state-of-the-art methods by a margin on both synthetic and real-world datasets. Our open-source code is available at https://github.com/liuxu0303/EReFormer.
引用
收藏
页码:7417 / 7429
页数:13
相关论文
共 50 条
  • [1] Swin-Depth: Using Transformers and Multi-Scale Fusion for Monocular-Based Depth Estimation
    Cheng, Zeyu
    Zhang, Yi
    Tang, Chengkai
    IEEE SENSORS JOURNAL, 2021, 21 (23) : 26912 - 26920
  • [2] Event-Based Depth Prediction With Deep Spiking Neural Network
    Wu, Xiaoshan
    He, Weihua
    Yao, Man
    Zhang, Ziyang
    Wang, Yaoyuan
    Xu, Bo
    Li, Guoqi
    IEEE TRANSACTIONS ON COGNITIVE AND DEVELOPMENTAL SYSTEMS, 2024, 16 (06) : 2008 - 2018
  • [3] Depth cue fusion for event-based stereo depth estimation
    Ghosh, Dipon Kumar
    Jung, Yong Ju
    INFORMATION FUSION, 2025, 117
  • [4] Underwater Monocular Depth Estimation Based on Physical-Guided Transformer
    Wang, Chen
    Xu, Haiyong
    Jiang, Gangyi
    Yu, Mei
    Luo, Ting
    Chen, Yeyao
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62 : 18 - 18
  • [5] Real-Time Monocular Depth Estimation Merging Vision Transformers on Edge Devices for AIoT
    Liu, Xihao
    Wei, Wei
    Liu, Cheng
    Peng, Yuyang
    Huang, Jinhao
    Li, Jun
    IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2023, 72
  • [6] Event-Based Semantic Segmentation With Posterior Attention
    Jia, Zexi
    You, Kaichao
    He, Weihua
    Tian, Yang
    Feng, Yongxiang
    Wang, Yaoyuan
    Jia, Xu
    Lou, Yihang
    Zhang, Jingyi
    Li, Guoqi
    Zhang, Ziyang
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2023, 32 : 1829 - 1842
  • [7] A Study on the Generality of Neural Network Structures for Monocular Depth Estimation
    Bae, Jinwoo
    Hwang, Kyumin
    Im, Sunghoon
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (04) : 2224 - 2238
  • [8] Plane2Depth: Hierarchical Adaptive Plane Guidance for Monocular Depth Estimation
    Liu, Li
    Zhu, Ruijie
    Deng, Jiacheng
    Song, Ziyang
    Yang, Wenfei
    Zhang, Tianzhu
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2025, 35 (02) : 1136 - 1149
  • [9] A Monocular SLAM System Based on ResNet Depth Estimation
    Li, Zheng
    Yu, Lei
    Pan, Zihao
    IEEE SENSORS JOURNAL, 2023, 23 (13) : 15106 - 15114
  • [10] Semantic Monocular Depth Estimation Based on Artificial Intelligence
    Gurram, Akhil
    Urfalioglu, Onay
    Halfaoui, Ibrahim
    Bouzaraa, Fahd
    Lopez, Antonio M.
    IEEE INTELLIGENT TRANSPORTATION SYSTEMS MAGAZINE, 2021, 13 (04) : 99 - 103