3D Neighborhood Convolution: Learning Depth-Aware Features for RGB-D and RGB Semantic Segmentation

被引：20

作者：

Chen, Yunlu ^{[1
]}

Mensink, Thomas ^{[2
]}

Gavves, Efstratios ^{[1
]}

机构：

[1] Univ Amsterdam, Amsterdam, Netherlands

[2] Google Res, Amsterdam, Netherlands

来源：

2019 INTERNATIONAL CONFERENCE ON 3D VISION (3DV 2019) | 2019年

关键词：

D O I：

10.1109/3DV.2019.00028

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

A key challenge for RGB-D segmentation is how to effectively incorporate 3D geometric information from the depth channel into 2D appearance features. We propose to model the effective receptive field of 2D convolution based on the scale and locality from the 3D neighborhood. Standard convolutions are local in the image space (u, v), often with a fixed receptive field of 3x3 pixels. We propose to define convolutions local with respect to the corresponding point in the 3D real world space (x, y, z), where the depth channel is used to adapt the receptive field of the convolution, which yields the resulting filters invariant to scale and focusing on the certain range of depth. We introduce 3D Neighborhood Convolution (3DN-Conv), a convolutional operator around 3D neighborhoods. Further, we can use estimated depth to use our RGB-D based semantic segmentation model from RGB input. Experimental results validate that our proposed 3DN-Conv operator improves semantic segmentation, using either ground-truth depth (RGB-D) or estimated depth (RGB).

引用

页码：173 / 182

页数：10

共 42 条

[11] Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-Scale Convolutional Architecture [J].

Eigen, David ;

Fergus, Rob .

2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :2650-2658

[12] Deep Ordinal Regression Network for Monocular Depth Estimation [J].

Fu, Huan ;

Gong, Mingming ;

Wang, Chaohui ;

Batmanghelich, Kayhan ;

Tao, Dacheng .

2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :2002-2011

[13] Vision meets robotics: The KITTI dataset [J].

Geiger, A. ;

Lenz, P. ;

Stiller, C. ;

Urtasun, R. .

INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH, 2013, 32 (11) :1231-1237

[14] Learning Rich Features from RGB-D Images for Object Detection and Segmentation [J].

Gupta, Saurabh ;

Girshick, Ross ;

Arbelaez, Pablo ;

Malik, Jitendra .

COMPUTER VISION - ECCV 2014, PT VII, 2014, 8695 :345-360

[15] Perceptual Organization and Recognition of Indoor Scenes from RGB-D Images [J].

Gupta, Saurabh ;

Arbelaez, Pablo ;

Malik, Jitendra .

2013 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2013, :564-571

[16] FuseNet: Incorporating Depth into Semantic Segmentation via Fusion-Based CNN Architecture [J].

Hazirbas, Caner ;

Ma, Lingni ;

Domokos, Csaba ;

Cremers, Daniel .

COMPUTER VISION - ACCV 2016, PT I, 2017, 10111 :213-228

[17] Revisiting Single Image Depth Estimation: Toward Higher Resolution Maps with Accurate Object Boundaries [J].

Hu, Junjie ;

Ozay, Mete ;

Zhang, Yan ;

Okatani, Takayuki .

2019 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2019, :1043-1051

[18]

Jacobsen Jorn-Henrik., 2016, CVPR

[19] Deeper Depth Prediction with Fully Convolutional Residual Networks [J].

Laina, Iro ;

Rupprecht, Christian ;

Belagiannis, Vasileios ;

Tombari, Federico ;

Navab, Nassir .

PROCEEDINGS OF 2016 FOURTH INTERNATIONAL CONFERENCE ON 3D VISION (3DV), 2016, :239-248

[20] A Two-Streamed Network for Estimating Fine-Scaled Depth Maps from Single RGB Images [J].

Li, Jun ;

Klein, Reinhard ;

Yao, Angela .

2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :3392-3400

← 1 2 3 4 5 →