DeCoTR: Enhancing Depth Completion with 2D and 3D Attentions

被引:0
|
作者
Shi, Yunxiao [1 ]
Singh, Manish Kumar [1 ]
Cai, Hong [1 ]
Porikli, Fatih [1 ]
机构
[1] Qualcomm AI Res, San Diego, CA 92121 USA
关键词
LEARNING DEPTH; NETWORK; VISION;
D O I
10.1109/CVPR52733.2024.01021
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we introduce a novel approach that harnesses both 2D and 3D attentions to enable highly accurate depth completion without requiring iterative spatial propagations. Specifically, we first enhance a baseline convolutional depth completion model by applying attention to 2D features in the bottleneck and skip connections. This effectively improves the performance of this simple network and sets it on par with the latest, complex transformer-based models. Leveraging the initial depths and features from this network, we uplift the 2D features to form a 3D point cloud and construct a 3D point transformer to process it, allowing the model to explicitly learn and exploit 3D geometric features. In addition, we propose normalization techniques to process the point cloud, which improves learning and leads to better accuracy than directly using point transformers off the shelf. Furthermore, we incorporate global attention on downsampled point cloud features, which enables long-range context while still being computationally feasible. We evaluate our method, DeCoTR, on established depth completion benchmarks, including NYU Depth V2 and KITTI, showcasing that it sets new state-of-the-art performance. We further conduct zero-shot evaluations on ScanNet and DDAD benchmarks and demonstrate that DeCoTR has superior generalizability compared to existing approaches.
引用
收藏
页码:10736 / 10746
页数:11
相关论文
共 50 条
  • [31] Combining 2D and 3D deep models for action recognition with depth information
    Ali Seydi Keçeli
    Aydın Kaya
    Ahmet Burak Can
    Signal, Image and Video Processing, 2018, 12 : 1197 - 1205
  • [32] MagicMark: a marking menu using 2D direction and 3D depth information
    Fei Lyu
    Rui Xi
    Yuxin Han
    Yujie Liu
    Science China Information Sciences, 2018, 61
  • [33] A 2D/3D convertible integral imaging display with enhanced depth of field
    Zhang, Lin-Bo
    Liu, Yi-Jian
    Li, Wei-Ze
    Chu, Fan
    Zhang, Han-Le
    Xing, Yan
    Wang, Qiong-Hua
    OPTICS AND LASERS IN ENGINEERING, 2024, 181
  • [34] 2D to 3D conversion with motion-type adaptive depth estimation
    Jung, Cheolkon
    Wang, Lei
    Zhu, Xiaohua
    Jiao, Licheng
    MULTIMEDIA SYSTEMS, 2015, 21 (05) : 451 - 464
  • [35] Combining 2D and 3D deep models for action recognition with depth information
    Keceli, Ali Seydi
    Kaya, Aydin
    Can, Ahmet Burak
    SIGNAL IMAGE AND VIDEO PROCESSING, 2018, 12 (06) : 1197 - 1205
  • [36] 2D to 3D conversion with motion-type adaptive depth estimation
    Cheolkon Jung
    Lei Wang
    Xiaohua Zhu
    Licheng Jiao
    Multimedia Systems, 2015, 21 : 451 - 464
  • [37] A new descriptor for 2D depth image indexing and 3D model retrieval
    Chaouch, Mohamed
    Verroust-Blondet, Anne
    2007 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, VOLS 1-7, 2007, : 3169 - 3172
  • [38] The representation and perception of 3D space: Interactions between 2D location and depth
    Finlayson, Nonie J.
    Zhang, Xiaoli
    Golomb, Julie D.
    VISUAL COGNITION, 2015, 23 (07) : 832 - 836
  • [39] MagicMark: a marking menu using 2D direction and 3D depth information
    Lyu, Fei
    Xi, Rui
    Han, Yuxin
    Liu, Yujie
    SCIENCE CHINA-INFORMATION SCIENCES, 2018, 61 (06)
  • [40] DYNAMIC SCENE DEPTH GENERATION METHOD FOR 2D TO 3D VIDEO CONVERSION
    Tsai, Tsung-Han
    Fan, Chen-Shuo
    Huang, Tai-Wei
    2015 IEEE INTERNATIONAL CONFERENCE ON CONSUMER ELECTRONICS - TAIWAN (ICCE-TW), 2015, : 340 - 341