Learning Pixel-Wise Continuous Depth Representation via Clustering for Depth Completion

被引：2

作者：

Chen, Shenglun ^{[1
]}

Zhang, Hong ^{[2
]}

Ma, Xinzhu ^{[3
]}

Wang, Zhihui ^{[4
,5
]}

Li, Haojie ^{[2
]}

机构：

[1] Dalian Univ Technol, Sch Software Technol, Dalian 116620, Peoples R China

[2] Shandong Univ Sci & Technol, Coll Comp Sci & Engn, Qingdao 266590, Peoples R China

[3] Shanghai AI Lab, Shanghai Artificial Intelligence Lab, Shanghai 200030, Peoples R China

[4] Dalian Univ Technol, DUT RU Int Sch Informat Sci & Engn, Dalian 116620, Peoples R China

[5] Dalian Univ Technol, Key Lab Ubiquitous Network & Serv Software Liaonin, Dalian 116620, Peoples R China

来源：

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY | 2024年 / 34卷 / 07期

基金：

中国国家自然科学基金;

关键词：

Transformers; Feature extraction; Logistics; Estimation; Task analysis; Kernel; Circuits and systems; Depth completion; classification; clustering; offset estimation; NETWORK;

D O I：

10.1109/TCSVT.2024.3359190

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Depth completion is a long-standing challenge in computer vision, where classification-based methods have made tremendous progress in recent years. However, most existing classification-based methods rely on pre-defined pixel-shared and discrete depth values as depth categories. This representation fails to capture the continuous depth values that conform to the real depth distribution, leading to depth smearing in boundary regions. To address this issue, we revisit depth completion from the clustering perspective and propose a novel clustering-based framework called CluDe which focuses on learning the pixel-wise and continuous depth representation. The key idea of CluDe is to iteratively update the pixel-shared and discrete depth representation to its corresponding pixel-wise and continuous counterpart, driven by the real depth distribution. Specifically, CluDe first utilizes depth value clustering to learn a set of depth centers as the depth representation. While these depth centers are pixel-shared and discrete, they are more in line with the real depth distribution compared to pre-defined depth categories. Then, CluDe estimates offsets for these depth centers, enabling their dynamic adjustment along the depth axis of the depth distribution to generate the pixel-wise and continuous depth representation. Extensive experiments demonstrate that CluDe successfully reduces depth smearing around object boundaries by utilizing pixel-wise and continuous depth representation. Furthermore, CluDe achieves state-of-the-art performance on the VOID datasets and outperforms classification-based methods on the KITTI dataset.

引用

页码：6303 / 6317

页数：15

共 49 条

[1]

Cabon Y, 2020, Arxiv, DOI [arXiv:2001.10773, 10.48550/arXiv.2001.10773]

[2] Argoverse: 3D Tracking and Forecasting with Rich Maps [J].

Chang, Ming-Fang ;

Lambert, John ;

Sangkloy, Patsorn ;

Singh, Jagjeet ;

Bak, Slawomir ;

Hartnett, Andrew ;

Wang, De ;

Carr, Peter ;

Lucey, Simon ;

Ramanan, Deva ;

Hays, James .

2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :8740-8749

[3] On the Over-Smoothing Problem of CNN Based Disparity Estimation [J].

Chen, Chuangrong ;

Chen, Xiaozhi ;

Cheng, Hui .

2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :8996-9004

[4] UAMD-Net: A Unified Adaptive Multimodal Neural Network for Dense Depth Completion [J].

Chen, Guancheng ;

Lin, Junli ;

Qin, Huabiao .

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (10) :5406-5419

[5] Depth Estimation via Affinity Learned with Convolutional Spatial Propagation Network [J].

Cheng, Xinjing ;

Wang, Peng ;

Yang, Ruigang .

COMPUTER VISION - ECCV 2018, PT XVI, 2018, 11220 :108-125

[6]

Dosovitskiy A, 2021, Arxiv, DOI arXiv:2010.11929

[7] Depth Completion using Geometry-Aware Embedding [J].

Du, Wenchao ;

Chen, Hu ;

Yang, Hongyu ;

Zhang, Yi .

2022 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION, ICRA 2022, 2022, :8680-8686

[8] JGR-P2O: Joint Graph Reasoning Based Pixel-to-Offset Prediction Network for 3D Hand Pose Estimation from a Single Depth Image [J].

Fang, Linpu ;

Liu, Xingyan ;

Liu, Li ;

Xu, Hang ;

Kang, Wenxiong .

COMPUTER VISION - ECCV 2020, PT VI, 2020, 12351 :120-137

[9] Deep Ordinal Regression Network for Monocular Depth Estimation [J].

Fu, Huan ;

Gong, Mingming ;

Wang, Chaohui ;

Batmanghelich, Kayhan ;

Tao, Dacheng .

2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :2002-2011

[10]

Garg D, 2020, ADV NEUR IN, V33

← 1 2 3 4 5 →