DeCoTR: Enhancing Depth Completion with 2D and 3D Attentions

被引:0
|
作者
Shi, Yunxiao [1 ]
Singh, Manish Kumar [1 ]
Cai, Hong [1 ]
Porikli, Fatih [1 ]
机构
[1] Qualcomm AI Res, San Diego, CA 92121 USA
关键词
LEARNING DEPTH; NETWORK; VISION;
D O I
10.1109/CVPR52733.2024.01021
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we introduce a novel approach that harnesses both 2D and 3D attentions to enable highly accurate depth completion without requiring iterative spatial propagations. Specifically, we first enhance a baseline convolutional depth completion model by applying attention to 2D features in the bottleneck and skip connections. This effectively improves the performance of this simple network and sets it on par with the latest, complex transformer-based models. Leveraging the initial depths and features from this network, we uplift the 2D features to form a 3D point cloud and construct a 3D point transformer to process it, allowing the model to explicitly learn and exploit 3D geometric features. In addition, we propose normalization techniques to process the point cloud, which improves learning and leads to better accuracy than directly using point transformers off the shelf. Furthermore, we incorporate global attention on downsampled point cloud features, which enables long-range context while still being computationally feasible. We evaluate our method, DeCoTR, on established depth completion benchmarks, including NYU Depth V2 and KITTI, showcasing that it sets new state-of-the-art performance. We further conduct zero-shot evaluations on ScanNet and DDAD benchmarks and demonstrate that DeCoTR has superior generalizability compared to existing approaches.
引用
收藏
页码:10736 / 10746
页数:11
相关论文
共 50 条
  • [1] DEPTH GENERATION METHOD FOR 2D TO 3D CONVERSION
    Yu, Fengli
    Liu, Ju
    Ren, Yannan
    Sun, Jiande
    Gao, Yuling
    Liu, Wei
    2011 3DTV CONFERENCE: THE TRUE VISION - CAPTURE, TRANSMISSION AND DISPLAY OF 3D VIDEO (3DTV-CON), 2011,
  • [2] Semantic Scene Completion With 2D and 3D Feature Fusion
    Park, Sang-Min
    Ha, Jong-Eun
    IEEE ACCESS, 2024, 12 : 141594 - 141603
  • [3] Enhancing 2D environments with 3D data input
    Patsakis, Constantinos
    Alexandris, Nikolaos
    Flerianou, Elina
    NEW DIRECTIONS IN INTELLIGENT INTERACTIVE MULTIMEDIA SYSTEMS AND SERVICES - 2, 2009, 226 : 321 - 326
  • [4] Enhancing 2D GUIs with 3D input devices
    Patsakis, Constantinos
    Alexandris, Nikolaos
    INTELLIGENT DECISION TECHNOLOGIES-NETHERLANDS, 2010, 4 (03): : 211 - 216
  • [5] The effect of finite depth on 2D and 3D cavitating hydrofoils
    Bal, Sakir
    JOURNAL OF MARINE SCIENCE AND TECHNOLOGY, 2011, 16 (02) : 129 - 142
  • [6] The effect of finite depth on 2D and 3D cavitating hydrofoils
    Sakir Bal
    Journal of Marine Science and Technology, 2011, 16 : 129 - 142
  • [7] Viewing Angle, Depth and Directionality of 2D and 3D Icons
    Lin, Hsuan
    Huang, Kuo-Liang
    Lin, Wei
    HUMAN ASPECTS OF IT FOR THE AGED POPULATION: ACCEPTANCE, COMMUNICATION AND PARTICIPATION, PT I, 2018, 10926 : 307 - 314
  • [8] Depth processing in 2D representations of 3D objects: Is there any?
    Ray, RH
    AUSTRALIAN JOURNAL OF PSYCHOLOGY, 2002, 54 (01) : 51 - 52
  • [9] 2D to 3D Video Conversion via Depth Inference
    Kuo, Tien-Ying
    Hsieh, Cheng-Hong
    Wan, Kuan-Hung
    Chen, Yan-Jhu
    INTELLIGENT SYSTEMS AND APPLICATIONS (ICS 2014), 2015, 274 : 1175 - 1183
  • [10] 2D or 3D?
    Mills, R
    COMPUTER-AIDED ENGINEERING, 1996, 15 (08): : 4 - 4