Joint Task-Recursive Learning for RGB-D Scene Understanding

被引：15

作者：

Zhang, Zhenyu ^{[1
,2
]}

Cui, Zhen ^{[1
,2
]}

Xu, Chunyan ^{[1
,2
]}

Jie, Zequn ^{[3
]}

Li, Xiang ^{[1
,2
]}

Yang, Jian ^{[1
,2
]}

机构：

[1] Nanjing Univ Sci & Technol, Sch Comp Sci & Engn, PCA Lab,Minist Educ, Key Lab Intelligent Percept & Syst High Dimens In, Nanjing 210094, Peoples R China

[2] Nanjing Univ Sci & Technol, Sch Comp Sci & Engn, Jiangsu Key Lab Image & Video Understanding Socia, Nanjing 210094, Peoples R China

[3] Tencent AI Lab, Nanjing 210094, Peoples R China

来源：

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE | 2020年 / 42卷 / 10期

关键词：

Task analysis; Estimation; Semantics; Image segmentation; Learning systems; Fuses; Cameras; Depth estimation; surface normal estimation; semantic segmentation; recursive learning; RGB-D scene understanding;

D O I：

10.1109/TPAMI.2019.2926728

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

RGB-D scene understanding under monocular camera is an emerging and challenging topic with many potential applications. In this paper, we propose a novel Task-Recursive Learning (TRL) framework to jointly and recurrently conduct three representative tasks therein containing depth estimation, surface normal prediction and semantic segmentation. TRL recursively refines the prediction results through a series of task-level interactions, where one-time cross-task interaction is abstracted as one network block of one time stage. In each stage, we serialize multiple tasks into a sequence and then recursively perform their interactions. To adaptively enhance counterpart patterns, we encapsulate interactions into a specific Task-Attentional Module (TAM) to mutually-boost the tasks from each other. Across stages, the historical experiences of previous states of tasks are selectively propagated into the next stages by using Feature-Selection unit (FS-Unit), which takes advantage of complementary information across tasks. The sequence of task-level interactions is also evolved along a coarse-to-fine scale space such that the required details may be refined progressively. Finally the task-abstracted sequence problem of multi-task prediction is framed into a recursive network. Extensive experiments on NYU-Depth v2 and SUN RGB-D datasets demonstrate that our method can recursively refines the results of the triple tasks and achieves state-of-the-art performance.

引用

页码：2608 / 2623

页数：16

共 71 条

[41] Cross-stitch Networks for Multi-task Learning [J].

Misra, Ishan ;

Shrivastava, Abhinav ;

Gupta, Abhinav ;

Hebert, Martial .

2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :3994-4003

[42] Learning Deconvolution Network for Semantic Segmentation [J].

Noh, Hyeonwoo ;

Hong, Seunghoon ;

Han, Bohyung .

2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :1520-1528

[43] RDFNet: RGB-D Multi-level Residual Feature Fusion for Indoor Semantic Segmentation [J].

Park, Seong-Jin ;

Hong, Ki-Sang ;

Lee, Seungyong .

2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :4990-4999

[44] GeoNet: Geometric Neural Network for Joint Depth and Surface Normal Estimation [J].

Qi, Xiaojuan ;

Liao, Renjie ;

Liu, Zhengzhe ;

Urtasun, Raquel ;

Jia, Jiaya .

2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :283-291

[45] 3D Graph Neural Networks for RGBD Semantic Segmentation [J].

Qi, Xiaojuan ;

Liao, Renjie ;

Jia, Jiaya ;

Fidler, Sanja ;

Urtasun, Raquel .

2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :5209-5218

[46] CNN Features off-the-shelf: an Astounding Baseline for Recognition [J].

Razavian, Ali Sharif ;

Azizpour, Hossein ;

Sullivan, Josephine ;

Carlsson, Stefan .

2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW), 2014, :512-519

[47] Monocular Depth Estimation Using Neural Regression Forest [J].

Roy, Anirban ;

Todorovic, Sinisa .

2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :5506-5514

[48] Fully Convolutional Networks for Semantic Segmentation [J].

Shelhamer, Evan ;

Long, Jonathan ;

Darrell, Trevor .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2017, 39 (04) :640-651

[49] Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network [J].

Shi, Wenzhe ;

Caballero, Jose ;

Huszar, Ferenc ;

Totz, Johannes ;

Aitken, Andrew P. ;

Bishop, Rob ;

Rueckert, Daniel ;

Wang, Zehan .

2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :1874-1883

[50] Indoor Segmentation and Support Inference from RGBD Images [J].

Silberman, Nathan ;

Hoiem, Derek ;

Kohli, Pushmeet ;

Fergus, Rob .

COMPUTER VISION - ECCV 2012, PT V, 2012, 7576 :746-760

← 1 2 3 4 5 6 7 8 →