Self-Supervised Monocular Depth Estimation With Self-Perceptual Anomaly Handling

被引：7

作者：

Zhang, Yourun ^{[1
]}

Gong, Maoguo ^{[1
]}

Zhang, Mingyang ^{[1
]}

Li, Jianzhao ^{[1
]}

机构：

[1] Xidian Univ, Key Lab Collaborat Intelligence Syst, Minist Educ, Xian 710071, Peoples R China

来源：

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS | 2024年 / 35卷 / 12期

基金：

中国国家自然科学基金;

关键词：

Monocular depth estimation; self-supervised learning; structure from motion; view synthesis; NETWORK; STEREO;

D O I：

10.1109/TNNLS.2023.3301711

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

It is attractive to extract plausible 3-D information from a single 2-D image, and self-supervised learning has shown impressive potential in this field. However, when only monocular videos are available as training data, moving objects at similar speeds to the camera can disturb the reprojection process during training. Existing methods filter out some moving pixels by comparing pixelwise photometric error, but the illumination inconsistency between frames leads to incomplete filtering. In addition, existing methods calculate photometric error within local windows, which leads to the fact that even if an anomalous pixel is masked out, it can still implicitly disturb the reprojection process, as long as it is in the local neighborhood of a nonanomalous pixel. Moreover, the ill-posed nature of monocular depth estimation makes the same scene correspond to multiple plausible depth maps, which damages the robustness of the model. In order to alleviate the above problems, we propose: 1) a self-reprojection mask to further filter out moving objects while avoiding illumination inconsistency; 2) a self-statistical mask method to prevent the filtered anomalous pixels from implicitly disturbing the reprojection; and 3) a self-distillation augmentation consistency loss to reduce the impact of ill-posed nature of monocular depth estimation. Our method shows superior performance on the KITTI dataset, especially when evaluating only the depth of potential moving objects.

引用

页码：17292 / 17306

页数：15

共 85 条

[1]

Bae J., 2023, PROC 37 AAAI C ARTIF, P1

[2] PLADE-Net: Towards Pixel-Level Accuracy for Self-Supervised Single-View Depth Estimation with Neural Positional Encoding and Distilled Matting Loss [J].

Bello, Juan Luis Gonzalez ;

Kim, Munchurl .

2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :6847-6856

[3] AdaBins: Depth Estimation Using Adaptive Bins [J].

Bhat, Shariq Farooq ;

Alhashim, Ibraheem ;

Wonka, Peter .

2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :4008-4017

[4] Loop-Box: Multiagent Direct SLAM Triggered by Single Loop Closure for Large-Scale Mapping [J].

Bhutta, M. Usman Maqbool ;

Kuse, Manohar ;

Fan, Rui ;

Liu, Yanan ;

Liu, Ming .

IEEE TRANSACTIONS ON CYBERNETICS, 2022, 52 (06) :5088-5097

[5] Auto-Rectify Network for Unsupervised Indoor Depth Estimation [J].

Bian, Jia-Wang ;

Zhan, Huangying ;

Wang, Naiyan ;

Chin, Tat-Jun ;

Shen, Chunhua ;

Reid, Ian .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (12) :9802-9813

[6] Unsupervised Scale-Consistent Depth Learning from Video [J].

Bian, Jia-Wang ;

Zhan, Huangying ;

Wang, Naiyan ;

Li, Zhichao ;

Zhang, Le ;

Shen, Chunhua ;

Cheng, Ming-Ming ;

Reid, Ian .

INTERNATIONAL JOURNAL OF COMPUTER VISION, 2021, 129 (09) :2548-2564

[7]

Casser V, 2019, AAAI CONF ARTIF INTE, P8001

[8] Laplacian Pyramid Neural Network for Dense Continuous-Value Regression for Complex Scenes [J].

Chen, Xuejin ;

Chen, Xiaotian ;

Zhang, Yiteng ;

Fu, Xueyang ;

Zha, Zheng-Jun .

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2021, 32 (11) :5034-5046

[9] Adaptive confidence thresholding for monocular depth estimation [J].

Choi, Hyesong ;

Lee, Hunsang ;

Kim, Sunkyung ;

Kim, Sunok ;

Kim, Seungryong ;

Sohn, Kwanghoon ;

Min, Dongbo .

2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :12788-12798

[10]

Eigen D, 2014, ADV NEUR IN, V27

← 1 2 3 4 5 6 7 8 9 →