Event-Based Monocular Depth Estimation With Recurrent Transformers

被引：5

作者：

Liu, Xu ^{[1
,2
]}

Li, Jianing ^{[3
]}

Shi, Jinqiao ^{[4
]}

Fan, Xiaopeng ^{[1
,2
]}

Tian, Yonghong ^{[2
,3
]}

Zhao, Debin ^{[1
,2
]}

机构：

[1] Harbin Inst Technol, Res Ctr Intelligent Interface & Human Comp Interac, Dept Comp Sci & Technol, Harbin 150001, Peoples R China

[2] Peng Cheng Lab, Shenzhen 518000, Peoples R China

[3] Peking Univ, Sch Comp Sci, Beijing 100871, Peoples R China

[4] Beijing Univ Posts & Telecommun, Sch Cyberspace Secur, Beijing 100871, Peoples R China

来源：

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY | 2024年 / 34卷 / 08期

基金：

中国国家自然科学基金;

关键词：

Transformers; Estimation; Cameras; Voltage control; Task analysis; Streaming media; Circuits and systems; Event camera; monocular depth estimator; recurrent transformer; cross attention; VISION;

D O I：

10.1109/TCSVT.2024.3378742

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Event cameras, offering high temporal resolutions and high dynamic ranges, have brought a new perspective to address common challenges in monocular depth estimation (e.g., motion blur and low light). However, existing CNN-based methods insufficiently exploit global spatial information from asynchronous events, while RNN-based methods show a limited capacity for effective temporal cues utilization for event-based monocular depth estimation. To this end, we propose a event-based monocular depth estimator with recurrent transformers, namely EReFormer. Technically, we first design a transformer-based encoder-decoder that utilizes multi-scale features to model global spatial information from events. Then, we propose a Gate Recurrent Vision Transformer (GRViT), introducing a recursive mechanism into transformers, to leverage rich temporal cues from events. Finally, we present a Cross Attention-guided Skip Connection (CASC), performing cross attention to fuse multi-scale features, to improve global spatial modeling capabilities. The experimental results show that our EReFormer outperforms state-of-the-art methods by a margin on both synthetic and real-world datasets. Our open-source code is available at https://github.com/liuxu0303/EReFormer.

引用

页码：7417 / 7429

页数：13

共 50 条

[1] Swin-Depth: Using Transformers and Multi-Scale Fusion for Monocular-Based Depth Estimation
Cheng, Zeyu
Zhang, Yi
Tang, Chengkai
IEEE SENSORS JOURNAL, 2021, 21 (23) : 26912 - 26920
[2] Event-Based Depth Prediction With Deep Spiking Neural Network
Wu, Xiaoshan
He, Weihua
Yao, Man
Zhang, Ziyang
Wang, Yaoyuan
Xu, Bo
Li, Guoqi
IEEE TRANSACTIONS ON COGNITIVE AND DEVELOPMENTAL SYSTEMS, 2024, 16 (06) : 2008 - 2018
[3] Depth cue fusion for event-based stereo depth estimation
Ghosh, Dipon Kumar
Jung, Yong Ju
INFORMATION FUSION, 2025, 117
[4] Underwater Monocular Depth Estimation Based on Physical-Guided Transformer
Wang, Chen
Xu, Haiyong
Jiang, Gangyi
Yu, Mei
Luo, Ting
Chen, Yeyao
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62 : 18 - 18
[5] Real-Time Monocular Depth Estimation Merging Vision Transformers on Edge Devices for AIoT
Liu, Xihao
Wei, Wei
Liu, Cheng
Peng, Yuyang
Huang, Jinhao
Li, Jun
IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2023, 72
[6] Event-Based Semantic Segmentation With Posterior Attention
Jia, Zexi
You, Kaichao
He, Weihua
Tian, Yang
Feng, Yongxiang
Wang, Yaoyuan
Jia, Xu
Lou, Yihang
Zhang, Jingyi
Li, Guoqi
Zhang, Ziyang
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2023, 32 : 1829 - 1842
[7] A Study on the Generality of Neural Network Structures for Monocular Depth Estimation
Bae, Jinwoo
Hwang, Kyumin
Im, Sunghoon
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (04) : 2224 - 2238
[8] Plane2Depth: Hierarchical Adaptive Plane Guidance for Monocular Depth Estimation
Liu, Li
Zhu, Ruijie
Deng, Jiacheng
Song, Ziyang
Yang, Wenfei
Zhang, Tianzhu
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2025, 35 (02) : 1136 - 1149
[9] A Monocular SLAM System Based on ResNet Depth Estimation
Li, Zheng
Yu, Lei
Pan, Zihao
IEEE SENSORS JOURNAL, 2023, 23 (13) : 15106 - 15114
[10] Semantic Monocular Depth Estimation Based on Artificial Intelligence
Gurram, Akhil
Urfalioglu, Onay
Halfaoui, Ibrahim
Bouzaraa, Fahd
Lopez, Antonio M.
IEEE INTELLIGENT TRANSPORTATION SYSTEMS MAGAZINE, 2021, 13 (04) : 99 - 103

← 1 2 3 4 5 →