Multimodal Monocular Dense Depth Estimation with Event-Frame Fusion Using Transformer

被引:0
|
作者
Xiao, Baihui [1 ]
Xu, Jingzehua [1 ]
Zhang, Zekai [1 ]
Xing, Tianyu [1 ]
Wang, Jingjing [2 ]
Ren, Yong [3 ]
机构
[1] Tsinghua Univ, Tsinghua Shenzhen Int Grad Sch, Shenzhen, Peoples R China
[2] Beihang Univ, Sch Cyber Sci & Technol, Beijing, Peoples R China
[3] Tsinghua Univ, Dept Elect Engn, Beijing, Peoples R China
来源
ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING-ICANN 2024, PT II | 2024年 / 15017卷
基金
中国国家自然科学基金;
关键词
Frame Camera; Event Camera; Multi-modal Fusion; Transformer self-attention; Monocular depth estimation; VISION;
D O I
10.1007/978-3-031-72335-3_29
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Frame cameras struggle to estimate depth maps accurately under abnormal lighting conditions. In contrast, event cameras, with their high temporal resolution and high dynamic range, can capture sparse, asynchronous event streams that record pixel brightness changes, addressing the limitations of frame cameras. However, the potential of asynchronous events remains underexploited, which hinders the ability of event cameras to predict dense depth maps effectively. Integrating event streams with frame data can significantly enhance the monocular depth estimation accuracy, especially in complex scenarios. In this study, we introduce a novel depth estimation framework that combines event and frame data using a transformer-based model. Our proposed framework contains two primary components: a multimodal encoder and a joint decoder. The multimodal encoder employs self-attention mechanisms to analyze the interactions between frame patches and event tensors, mapping out dependencies across local and global spatiotemporal events. This multi-scale fusion approach maximizes the benefits of both event and frame inputs. The joint decoder incorporates a dual-phase, triple-scale feature fusion module, which extracts contextual information and delivers detailed depth prediction results. Our experimental results on the EventScape and MVSEC datasets affirm that our method sets a new benchmark in performance.
引用
收藏
页码:419 / 433
页数:15
相关论文
共 50 条
  • [31] Illumination Insensitive Monocular Depth Estimation Based on Scene Object Attention and Depth Map Fusion
    Wen, Jing
    Ma, Haojiang
    Yang, Jie
    Zhang, Songsong
    PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT X, 2024, 14434 : 358 - 370
  • [32] PCTDepth: Exploiting Parallel CNNs and Transformer via Dual Attention for Monocular Depth Estimation
    Xia, Chenxing
    Duan, Xiuzhen
    Gao, Xiuju
    Ge, Bin
    Li, Kuan-Ching
    Fang, Xianjin
    Zhang, Yan
    Yang, Ke
    NEURAL PROCESSING LETTERS, 2024, 56 (02)
  • [33] PCTDepth: Exploiting Parallel CNNs and Transformer via Dual Attention for Monocular Depth Estimation
    Chenxing Xia
    Xiuzhen Duan
    Xiuju Gao
    Bin Ge
    Kuan-Ching Li
    Xianjin Fang
    Yan Zhang
    Ke Yang
    Neural Processing Letters, 56
  • [34] Lightweight Self-Supervised Monocular Depth Estimation Through CNN and Transformer Integration
    Wang, Zhe
    Zou, Yongjia
    Lv, Jin
    Cao, Yang
    Yu, Hongfei
    IEEE ACCESS, 2024, 12 : 167934 - 167943
  • [35] EMTNet: efficient mobile transformer network for real-time monocular depth estimation
    Yan, Long
    Yu, Fuyang
    Dong, Chao
    PATTERN ANALYSIS AND APPLICATIONS, 2023, 26 (04) : 1833 - 1846
  • [36] PCTNet:3D Point Cloud and Transformer Network for Monocular Depth Estimation
    Hong, Yusheng
    Liu, Xiaolong
    Dai, Hang
    Tao, Wenqi
    2022 10TH INTERNATIONAL CONFERENCE ON INFORMATION AND EDUCATION TECHNOLOGY (ICIET 2022), 2022, : 415 - 419
  • [37] LA-Net: Layout-Aware Dense Network for Monocular Depth Estimation
    Zheng, Kecheng
    Zha, Zheng-Jun
    Cao, Yang
    Chen, Xuejin
    Wu, Feng
    PROCEEDINGS OF THE 2018 ACM MULTIMEDIA CONFERENCE (MM'18), 2018, : 1381 - 1388
  • [38] Monocular Depth Estimation Algorithm Integrating Parallel Transformer and Multi-Scale Features
    Wang, Weiqiang
    Tan, Chao
    Yan, Yunbing
    ELECTRONICS, 2023, 12 (22)
  • [39] EMTNet: efficient mobile transformer network for real-time monocular depth estimation
    Long Yan
    Fuyang Yu
    Chao Dong
    Pattern Analysis and Applications, 2023, 26 : 1833 - 1846
  • [40] Monocular depth estimation via cross-spectral stereo information fusion
    Liu, Huwei
    MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (21) : 61065 - 61081