Self-Supervised Monocular Depth Estimation with Effective Feature Fusion and Self Distillation

被引：0

作者：

Liu, Zhenfei ^{[1
,2
,3
]}

Song, Chengqun ^{[1
,2
,3
]}

Cheng, Jun ^{[1
,2
,3
]}

Luo, Jiefu ^{[1
,2
,3
]}

Wang, Xiaoyang ^{[1
]}

机构：

[1] Chinese Acad Sci, Shenzhen Inst Adv Technol, Guangdong Prov Key Lab Robot & Intelligent Syst, Shenzhen, Peoples R China

[2] Univ Chinese Acad Sci, Beijing, Peoples R China

[3] Chinese Univ Hong Kong, Hong Kong, Peoples R China

来源：

2024 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS, IROS 2024 | 2024年

关键词：

Monocular Depth Estimation; Self-Supervised Learning; Generalization; Self Distillation;

D O I：

10.1109/IROS58592.2024.10802237

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Monocular depth estimation obtaining scene depth information from a single image is an important task in the field of computer vision. Constrained by the limitations of convolutional networks in conducting long-distance modeling and the underutilization of datasets, the generalization of existing models is not satisfactory. In this paper, we propose an adaptive backbone named Internal Fusion Transformer to improve generalization ability compared to convolutional backbone, like HRNet, and a Bilateral Attention module which pays more attention to low-level semantic features compared to previous fuse methods. Meanwhile, we introduce three data augmentation methods, namely cropping-resizing (cr), cropping-shuffling (cs), and mirroring (mi), for self distillation, as well as discuss their contributions to model performance improvement. Our model is trained on the KITTI dataset, and without fine-tuning, tested on NYUv2 and Make3D datasets to confirm the generalization. The experimental results illustrate the effectiveness of our design. Our model also demonstrates better performance compared to other models on the KITTI dataset.

引用

页码：7160 / 7166

页数：7

共 40 条

[1]

Bhat SF, 2023, Arxiv, DOI [arXiv:2302.12288, DOI 10.48550/ARXIV.2302.12288]

[2] AdaBins: Depth Estimation Using Adaptive Bins [J].

Bhat, Shariq Farooq ;

Alhashim, Ibraheem ;

Wonka, Peter .

2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :4008-4017

[3]

Eigen D, 2014, ADV NEUR IN, V27

[4]

Fang ZC, 2020, IEEE WINT CONF APPL, P1080, DOI 10.1109/WACV45572.2020.9093334

[5] Deep Ordinal Regression Network for Monocular Depth Estimation [J].

Fu, Huan ;

Gong, Mingming ;

Wang, Chaohui ;

Batmanghelich, Kayhan ;

Tao, Dacheng .

2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :2002-2011

[6] Vision meets robotics: The KITTI dataset [J].

Geiger, A. ;

Lenz, P. ;

Stiller, C. ;

Urtasun, R. .

INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH, 2013, 32 (11) :1231-1237

[7] Digging Into Self-Supervised Monocular Depth Estimation [J].

Godard, Clement ;

Mac Aodha, Oisin ;

Firman, Michael ;

Brostow, Gabriel .

2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :3827-3837

[8] 3D Packing for Self-Supervised Monocular Depth Estimation [J].

Guizilini, Vitor ;

Ambrus, Rares ;

Pillai, Sudeep ;

Raventos, Allan ;

Gaidon, Adrien .

2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, :2482-2491

[9] BRNet: Exploring Comprehensive Features for Monocular Depth Estimation [J].

Han, Wencheng ;

Yin, Junbo ;

Jin, Xiaogang ;

Dai, Xiangdong ;

Shen, Jianbing .

COMPUTER VISION, ECCV 2022, PT XXXVIII, 2022, 13698 :586-602

[10] Deep Residual Learning for Image Recognition [J].

He, Kaiming ;

Zhang, Xiangyu ;

Ren, Shaoqing ;

Sun, Jian .

2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :770-778

← 1 2 3 4 →