DeCoTR: Enhancing Depth Completion with 2D and 3D Attentions

被引:0
作者
Shi, Yunxiao [1 ]
Singh, Manish Kumar [1 ]
Cai, Hong [1 ]
Porikli, Fatih [1 ]
机构
[1] Qualcomm AI Res, San Diego, CA 92121 USA
来源
2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR) | 2024年
关键词
LEARNING DEPTH; NETWORK; VISION;
D O I
10.1109/CVPR52733.2024.01021
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we introduce a novel approach that harnesses both 2D and 3D attentions to enable highly accurate depth completion without requiring iterative spatial propagations. Specifically, we first enhance a baseline convolutional depth completion model by applying attention to 2D features in the bottleneck and skip connections. This effectively improves the performance of this simple network and sets it on par with the latest, complex transformer-based models. Leveraging the initial depths and features from this network, we uplift the 2D features to form a 3D point cloud and construct a 3D point transformer to process it, allowing the model to explicitly learn and exploit 3D geometric features. In addition, we propose normalization techniques to process the point cloud, which improves learning and leads to better accuracy than directly using point transformers off the shelf. Furthermore, we incorporate global attention on downsampled point cloud features, which enables long-range context while still being computationally feasible. We evaluate our method, DeCoTR, on established depth completion benchmarks, including NYU Depth V2 and KITTI, showcasing that it sets new state-of-the-art performance. We further conduct zero-shot evaluations on ScanNet and DDAD benchmarks and demonstrate that DeCoTR has superior generalizability compared to existing approaches.
引用
收藏
页码:10736 / 10746
页数:11
相关论文
共 51 条
[1]  
[Anonymous], 2021, 2017 INT C 3D VIS 3D, DOI DOI 10.1109/TIP.2020.3040528
[2]  
[Anonymous], IEEE T IMAGE PROCESS
[3]  
Bai Lin, 2020, IEEE ACCESS, V8
[4]   DeepDriving: Learning Affordance for Direct Perception in Autonomous Driving [J].
Chen, Chenyi ;
Seff, Ari ;
Kornhauser, Alain ;
Xiao, Jianxiong .
2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :2722-2730
[5]   Extensions of Breiman's Theorem of Product of Dependent Random Variables with Applications to Ruin Theory [J].
Chen, Yu ;
Chen, Dan ;
Gao, Wenxue .
COMMUNICATIONS IN MATHEMATICS AND STATISTICS, 2019, 7 (01) :1-23
[6]   Learning Depth with Convolutional Spatial Propagation Network [J].
Cheng, Xinjing ;
Wang, Peng ;
Yang, Ruigang .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2020, 42 (10) :2361-2379
[7]   Depth Estimation via Affinity Learned with Convolutional Spatial Propagation Network [J].
Cheng, Xinjing ;
Wang, Peng ;
Yang, Ruigang .
COMPUTER VISION - ECCV 2018, PT XVI, 2018, 11220 :108-125
[8]   Xception: Deep Learning with Depthwise Separable Convolutions [J].
Chollet, Francois .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :1800-1807
[9]   ScanNet: Richly-annotated 3D Reconstructions of Indoor Scenes [J].
Dai, Angela ;
Chang, Angel X. ;
Savva, Manolis ;
Halber, Maciej ;
Funkhouser, Thomas ;
Niessner, Matthias .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :2432-2443
[10]   Learning Depth Vision-Based Personalized Robot Navigation From Dynamic Demonstrations in Virtual Reality [J].
de Heuvel, Jorge ;
Corral, Nathan ;
Kreis, Benedikt ;
Conradi, Jacobus ;
Driemel, Anne ;
Bennewitz, Maren .
2023 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2023, :6757-6764