Multimodal Monocular Dense Depth Estimation with Event-Frame Fusion Using Transformer

被引:0
|
作者
Xiao, Baihui [1 ]
Xu, Jingzehua [1 ]
Zhang, Zekai [1 ]
Xing, Tianyu [1 ]
Wang, Jingjing [2 ]
Ren, Yong [3 ]
机构
[1] Tsinghua Univ, Tsinghua Shenzhen Int Grad Sch, Shenzhen, Peoples R China
[2] Beihang Univ, Sch Cyber Sci & Technol, Beijing, Peoples R China
[3] Tsinghua Univ, Dept Elect Engn, Beijing, Peoples R China
来源
ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING-ICANN 2024, PT II | 2024年 / 15017卷
基金
中国国家自然科学基金;
关键词
Frame Camera; Event Camera; Multi-modal Fusion; Transformer self-attention; Monocular depth estimation; VISION;
D O I
10.1007/978-3-031-72335-3_29
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Frame cameras struggle to estimate depth maps accurately under abnormal lighting conditions. In contrast, event cameras, with their high temporal resolution and high dynamic range, can capture sparse, asynchronous event streams that record pixel brightness changes, addressing the limitations of frame cameras. However, the potential of asynchronous events remains underexploited, which hinders the ability of event cameras to predict dense depth maps effectively. Integrating event streams with frame data can significantly enhance the monocular depth estimation accuracy, especially in complex scenarios. In this study, we introduce a novel depth estimation framework that combines event and frame data using a transformer-based model. Our proposed framework contains two primary components: a multimodal encoder and a joint decoder. The multimodal encoder employs self-attention mechanisms to analyze the interactions between frame patches and event tensors, mapping out dependencies across local and global spatiotemporal events. This multi-scale fusion approach maximizes the benefits of both event and frame inputs. The joint decoder incorporates a dual-phase, triple-scale feature fusion module, which extracts contextual information and delivers detailed depth prediction results. Our experimental results on the EventScape and MVSEC datasets affirm that our method sets a new benchmark in performance.
引用
收藏
页码:419 / 433
页数:15
相关论文
共 50 条
  • [21] Simultaneous Monocular Endoscopic Dense Depth and Odometry Estimation Using Local-Global Integration Networks
    Fan, Wenkang
    Jiang, Wenjing
    Fang, Hao
    Shi, Hong
    Chen, Jianhua
    Luo, Xiongbiao
    MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2024, PT VI, 2024, 15006 : 564 - 574
  • [22] TAMDepth: self-supervised monocular depth estimation with transformer and adapter modulation
    Li, Shaokang
    Lyu, Chengzhi
    Xia, Bin
    Chen, Ziheng
    Zhang, Lei
    VISUAL COMPUTER, 2024, 40 (10) : 6797 - 6808
  • [23] SNN-ANN Hybrid Networks for Embedded Multimodal Monocular Depth Estimation
    Tumpa, Sadia Anjum
    Devulapally, Anusha
    Brehove, Matthew
    Kyubwa, Espoir
    Narayanan, Vijaykrishnan
    2024 IEEE COMPUTER SOCIETY ANNUAL SYMPOSIUM ON VLSI, ISVLSI, 2024, : 198 - 203
  • [24] Fully convolutional multi-scale dense networks for monocular depth estimation
    Liu, Jiwei
    Zhang, Yunzhou
    Cui, Jiahua
    Feng, Yonghui
    Pang, Linzhuo
    IET COMPUTER VISION, 2019, 13 (05) : 515 - 522
  • [25] Dense Depth Estimation in Monocular Endoscopy With Self-Supervised Learning Methods
    Liu, Xingtong
    Sinha, Ayushi
    Ishii, Masaru
    Hager, Gregory D.
    Reiter, Austin
    Taylor, Russell H.
    Unberath, Mathias
    IEEE TRANSACTIONS ON MEDICAL IMAGING, 2020, 39 (05) : 1438 - 1447
  • [26] FF-GAN: Feature Fusion GAN for Monocular Depth Estimation
    Jia, Ruiming
    Li, Tong
    Yuan, Fei
    PATTERN RECOGNITION AND COMPUTER VISION, PT I, PRCV 2020, 2020, 12305 : 167 - 179
  • [27] Monocular Depth and Velocity Estimation Based on Multi-Cue Fusion
    Qi, Chunyang
    Zhao, Hongxiang
    Song, Chuanxue
    Zhang, Naifu
    Song, Sinxin
    Xu, Haigang
    Xiao, Feng
    MACHINES, 2022, 10 (05)
  • [28] Depth cue fusion for event-based stereo depth estimation
    Ghosh, Dipon Kumar
    Jung, Yong Ju
    INFORMATION FUSION, 2025, 117
  • [29] Bayesian DeNet: Monocular Depth Prediction and Frame-Wise Fusion With Synchronized Uncertainty
    Yang, Xin
    Gao, Yang
    Luo, Hongcheng
    Liao, Chunyuan
    Cheng, Kwang-Ting
    IEEE TRANSACTIONS ON MULTIMEDIA, 2019, 21 (11) : 2701 - 2713
  • [30] Monocular Depth Estimation Using Deep Learning: A Review
    Masoumian, Armin
    Rashwan, Hatem A.
    Cristiano, Julian
    Asif, M. Salman
    Puig, Domenec
    SENSORS, 2022, 22 (14)