Using full-scale feature fusion for self-supervised indoor depth estimation

被引：0

作者：

Deqiang Cheng

Junhui Chen

Chen Lv

Chenggong Han

He Jiang

机构：

[1] China University of Mining and Technology,The School of Information and Control Engineering

来源：

Multimedia Tools and Applications | 2024年 / 83卷

关键词：

Monocular depth estimation; Feature fusion; Self-supervised; Indoor scenes; ResNeSt;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

Monocular depth estimation is a crucial task in computer vision, and self-supervised algorithms are gaining popularity due to their independence from expensive ground truth supervision. However, current self-supervised algorithms may not provide accurate estimation and may suffer from distorted boundaries when applied to indoor scenes. Combining multi-scale features is an important research direction in image segmentation to achieve accurate estimation and resolve boundary distortion. However, there are few studies on indoor self-supervised algorithms in this regard. To solve this issue, we propose a novel full-scale feature information fusion approach that includes a full-scale skip-connection and a full-scale feature fusion block. This approach can aggregate the high-level and low-level information of all scale feature maps during the network's encoding and decoding process to compensate for the network's loss of cross-layer feature information. The proposed full-scale feature fusion improves accuracy and reduces the decoder parameters. To fully exploit the superiority of the full-scale feature fusion module, we replace the encoder backbone from ResNet with the more advanced ResNeSt. Combining these two methods results in a significant improvement in prediction accuracy. We have extensively evaluated our approach on the indoor benchmark datasets NYU Depth V2 and ScanNet. Our experimental results demonstrate that our method outperforms existing algorithms, particularly on NYU Depth V2, where our precision is raised to 83.8%.

引用

页码：28215 / 28233

页数：18

共 33 条

[1] Han C(2022)Self-supervised monocular Depth estimation with multi-scale structure similarity loss Multimed Tools Appl 31 3251-3266
[2] Cheng D(2022)Monocular depth estimation with multi-view attention autoencoder Multimed Tools Appl 81 33759-33770
[3] Kou Q(2022)Transferring knowledge from monocular completion for self-supervised monocular depth estimation Multimed Tools Appl 81 42485-42495
[4] Wang X(2008)Make3d: learning 3d scene structure from a single still image IEEE Trans Pattern Anal Mach Intell 31 824-840
[5] Chen L(2004)Image quality assessment: from error visibility to structural similarity IEEE Trans Image Process 13 600-612
[6] Zhao J(2019)Unet++: Redesigning skip connections to exploit multiscale features in image segmentation IEEE Trans Med Imaging 39 1856-1867
[7] Jung G(2021)Iterative feature matching for self-supervised indoor depth estimation IEEE Trans Circuits Syst Video Technol 32 3839-3852
[8] Yoon SM(2015)Learning depth from single monocular images using deep convolutional neural fields IEEE Trans Pattern Anal Mach Intell 38 2024-2039
[9] Sun L(undefined)undefined undefined undefined undefined-undefined
[10] Li Y(undefined)undefined undefined undefined undefined-undefined

← 1 2 3 4 →