Event-Based Monocular Depth Estimation With Recurrent Transformers

被引:5
|
作者
Liu, Xu [1 ,2 ]
Li, Jianing [3 ]
Shi, Jinqiao [4 ]
Fan, Xiaopeng [1 ,2 ]
Tian, Yonghong [2 ,3 ]
Zhao, Debin [1 ,2 ]
机构
[1] Harbin Inst Technol, Res Ctr Intelligent Interface & Human Comp Interac, Dept Comp Sci & Technol, Harbin 150001, Peoples R China
[2] Peng Cheng Lab, Shenzhen 518000, Peoples R China
[3] Peking Univ, Sch Comp Sci, Beijing 100871, Peoples R China
[4] Beijing Univ Posts & Telecommun, Sch Cyberspace Secur, Beijing 100871, Peoples R China
基金
中国国家自然科学基金;
关键词
Transformers; Estimation; Cameras; Voltage control; Task analysis; Streaming media; Circuits and systems; Event camera; monocular depth estimator; recurrent transformer; cross attention; VISION;
D O I
10.1109/TCSVT.2024.3378742
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Event cameras, offering high temporal resolutions and high dynamic ranges, have brought a new perspective to address common challenges in monocular depth estimation (e.g., motion blur and low light). However, existing CNN-based methods insufficiently exploit global spatial information from asynchronous events, while RNN-based methods show a limited capacity for effective temporal cues utilization for event-based monocular depth estimation. To this end, we propose a event-based monocular depth estimator with recurrent transformers, namely EReFormer. Technically, we first design a transformer-based encoder-decoder that utilizes multi-scale features to model global spatial information from events. Then, we propose a Gate Recurrent Vision Transformer (GRViT), introducing a recursive mechanism into transformers, to leverage rich temporal cues from events. Finally, we present a Cross Attention-guided Skip Connection (CASC), performing cross attention to fuse multi-scale features, to improve global spatial modeling capabilities. The experimental results show that our EReFormer outperforms state-of-the-art methods by a margin on both synthetic and real-world datasets. Our open-source code is available at https://github.com/liuxu0303/EReFormer.
引用
收藏
页码:7417 / 7429
页数:13
相关论文
共 50 条
  • [21] Adversarial Patch Attacks on Monocular Depth Estimation Networks
    Yamanaka, Koichiro
    Matsumoto, Ryutaroh
    Takahashi, Keita
    Fujii, Toshiaki
    IEEE ACCESS, 2020, 8 : 179094 - 179104
  • [22] SAAM: Stealthy Adversarial Attack on Monocular Depth Estimation
    Guesmi, Amira
    Hanif, Muhammad Abdullah
    Ouni, Bassem
    Shafique, Muhammad
    IEEE ACCESS, 2024, 12 : 13571 - 13585
  • [23] Monocular Depth Estimation With Augmented Ordinal Depth Relationships
    Cao, Yuanzhouhan
    Zhao, Tianqi
    Xian, Ke
    Shen, Chunhua
    Cao, Zhiguo
    Xu, Shugong
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2020, 30 (08) : 2674 - 2682
  • [24] Fast Event-Based Optical Flow Estimation by Triplet Matching
    Shiba, Shintaro
    Aoki, Yoshimitsu
    Gallego, Guillermo
    IEEE SIGNAL PROCESSING LETTERS, 2022, 29 : 2712 - 2716
  • [25] Event Anonymization: Privacy-Preserving Person Re-Identification and Pose Estimation in Event-Based Vision
    Ahmad, Shafiq
    Morerio, Pietro
    Del Bue, Alessio
    IEEE ACCESS, 2024, 12 : 66964 - 66980
  • [26] Event-based video reconstruction via attention-based recurrent network
    Ma, Wenwen
    Ma, Shanxing
    Meiresone, Pieter
    Allebosch, Gianni
    Philips, Wilfried
    Aelterman, Jan
    NEUROCOMPUTING, 2025, 632
  • [27] Unsupervised Monocular Depth Estimation for Monocular Visual SLAM Systems
    Liu, Feng
    Huang, Ming
    Ge, Hongyu
    Tao, Dan
    Gao, Ruipeng
    IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2024, 73 : 1 - 13
  • [28] Robust Event-Based Vision Model Estimation by Dispersion Minimisation
    Nunes, Urbano Miguel
    Demiris, Yiannis
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (12) : 9561 - 9573
  • [29] Two-stage cross-fusion network for stereo event-based depth estimation
    Ghosh, Dipon Kumar
    Jung, Yong Ju
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 241
  • [30] Self-supervised Monocular Pose and Depth Estimation for Wireless Capsule Endoscopy with Transformers
    Nazifi, Nahid
    Araujo, Helder
    Erabati, Gopi Krishna
    Tahri, Omar
    IMAGE-GUIDED PROCEDURES, ROBOTIC INTERVENTIONS, AND MODELING, MEDICAL IMAGING 2024, 2024, 12928