Plane2Depth: Hierarchical Adaptive Plane Guidance for Monocular Depth Estimation

被引：1

作者：

Liu, Li ^{[1
]}

Zhu, Ruijie ^{[2
]}

Deng, Jiacheng ^{[2
]}

Song, Ziyang ^{[2
]}

Yang, Wenfei ^{[2
,3
]}

Zhang, Tianzhu ^{[2
,3
]}

机构：

[1] Univ Sci & Technol China, Inst Adv Technol, Hefei 230027, Peoples R China

[2] Univ Sci & Technol China, Sch Informat Sci, Hefei 230027, Peoples R China

[3] Deep Space Explorat Lab, Hefei 230088, Peoples R China

来源：

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY | 2025年 / 35卷 / 02期

基金：

中国博士后科学基金;

关键词：

Estimation; Adaptation models; Cameras; Aggregates; Predictive models; Feature extraction; Circuits and systems; Modulation; Generators; Transformers; Monocular depth estimation; plane guidance; transformer; dense prediction;

D O I：

10.1109/TCSVT.2024.3476952

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Monocular depth estimation aims to infer a dense depth map from a single image, which is a fundamental and prevalent task in computer vision. Many previous works have shown impressive depth estimation results through carefully designed network structures, but they usually ignore the planar information and therefore perform poorly in low-texture areas of indoor scenes. In this paper, we propose Plane2Depth, which adaptively utilizes plane information to improve depth prediction within a hierarchical framework. Specifically, in the proposed plane guided depth generator (PGDG), we design a set of plane queries as prototypes to softly model planes in the scene and predict per-pixel plane coefficients. Then the predicted plane coefficients can be converted into metric depth values with the pinhole camera model. In the proposed adaptive plane query aggregation (APGA) module, we introduce a novel feature interaction approach to improve the aggregation of multi-scale plane features in a top-down manner. Extensive experiments show that our method can achieve outstanding performance, especially in low-texture or repetitive areas. Furthermore, under the same backbone network, our method outperforms the state-of-the-art methods on the NYU-Depth-v2 dataset, achieves competitive results with state-of-the-art methods KITTI dataset and can be generalized to unseen scenes effectively.

引用

页码：1136 / 1149

页数：14

共 56 条

[1] AdaBins: Depth Estimation Using Adaptive Bins [J].

Bhat, Shariq Farooq ;

Alhashim, Ibraheem ;

Wonka, Peter .

2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :4008-4017

[2] Fast, Approximate Piecewise-Planar Modeling Based on Sparse Structure-from-Motion and Superpixels [J].

Bodis-Szomoru, Andras ;

Riemenschneider, Hayko ;

Van Gool, Luc .

2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, :469-476

[3] Monocular Depth Estimation With Augmented Ordinal Depth Relationships [J].

Cao, Yuanzhouhan ;

Zhao, Tianqi ;

Xian, Ke ;

Shen, Chunhua ;

Cao, Zhiguo ;

Xu, Shugong .

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2020, 30 (08) :2674-2682

[4] Estimating Depth From Monocular Images as Classification Using Deep Fully Convolutional Residual Networks [J].

Cao, Yuanzhouhan ;

Wu, Zifeng ;

Shen, Chunhua .

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2018, 28 (11) :3174-3182

[5] End-to-End Object Detection with Transformers [J].

Carion, Nicolas ;

Massa, Francisco ;

Synnaeve, Gabriel ;

Usunier, Nicolas ;

Kirillov, Alexander ;

Zagoruyko, Sergey .

COMPUTER VISION - ECCV 2020, PT I, 2020, 12346 :213-229

[6] Masked-attention Mask Transformer for Universal Image Segmentation [J].

Cheng, Bowen ;

Misra, Ishan ;

Schwing, Alexander G. ;

Kirillov, Alexander ;

Girdhar, Rohit .

2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, :1280-1289

[7]

Dosovitskiy A, 2021, Arxiv, DOI arXiv:2010.11929

[8]

Eigen D, 2014, ADV NEUR IN, V27

[9] Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-Scale Convolutional Architecture [J].

Eigen, David ;

Fergus, Rob .

2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :2650-2658

[10] Deep Ordinal Regression Network for Monocular Depth Estimation [J].

Fu, Huan ;

Gong, Mingming ;

Wang, Chaohui ;

Batmanghelich, Kayhan ;

Tao, Dacheng .

2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :2002-2011

← 1 2 3 4 5 6 →