Joint Task-Recursive Learning for RGB-D Scene Understanding

被引：15

作者：

Zhang, Zhenyu ^{[1
,2
]}

Cui, Zhen ^{[1
,2
]}

Xu, Chunyan ^{[1
,2
]}

Jie, Zequn ^{[3
]}

Li, Xiang ^{[1
,2
]}

Yang, Jian ^{[1
,2
]}

机构：

[1] Nanjing Univ Sci & Technol, Sch Comp Sci & Engn, PCA Lab,Minist Educ, Key Lab Intelligent Percept & Syst High Dimens In, Nanjing 210094, Peoples R China

[2] Nanjing Univ Sci & Technol, Sch Comp Sci & Engn, Jiangsu Key Lab Image & Video Understanding Socia, Nanjing 210094, Peoples R China

[3] Tencent AI Lab, Nanjing 210094, Peoples R China

来源：

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE | 2020年 / 42卷 / 10期

关键词：

Task analysis; Estimation; Semantics; Image segmentation; Learning systems; Fuses; Cameras; Depth estimation; surface normal estimation; semantic segmentation; recursive learning; RGB-D scene understanding;

D O I：

10.1109/TPAMI.2019.2926728

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

RGB-D scene understanding under monocular camera is an emerging and challenging topic with many potential applications. In this paper, we propose a novel Task-Recursive Learning (TRL) framework to jointly and recurrently conduct three representative tasks therein containing depth estimation, surface normal prediction and semantic segmentation. TRL recursively refines the prediction results through a series of task-level interactions, where one-time cross-task interaction is abstracted as one network block of one time stage. In each stage, we serialize multiple tasks into a sequence and then recursively perform their interactions. To adaptively enhance counterpart patterns, we encapsulate interactions into a specific Task-Attentional Module (TAM) to mutually-boost the tasks from each other. Across stages, the historical experiences of previous states of tasks are selectively propagated into the next stages by using Feature-Selection unit (FS-Unit), which takes advantage of complementary information across tasks. The sequence of task-level interactions is also evolved along a coarse-to-fine scale space such that the required details may be refined progressively. Finally the task-abstracted sequence problem of multi-task prediction is framed into a recursive network. Extensive experiments on NYU-Depth v2 and SUN RGB-D datasets demonstrate that our method can recursively refines the results of the triple tasks and achieves state-of-the-art performance.

引用

页码：2608 / 2623

页数：16

共 71 条

[1]

[Anonymous], 2014, INT C LEARN REPR ICL

[2]

[Anonymous], 2010, ADV NEURAL INFORM PR

[3]

[Anonymous], 2007, P 24 INT C MACHINE L

[4] Marr Revisited: 2D-3D Alignment via Surface Normal Prediction [J].

Bansal, Aayush ;

Russell, Bryan ;

Gupta, Abhinav .

2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :5965-5974

[5] The Problem State: A Cognitive Bottleneck in Multitasking [J].

Borst, Jelmer P. ;

Taatgen, Niels A. ;

van Rijn, Hedderik .

JOURNAL OF EXPERIMENTAL PSYCHOLOGY-LEARNING MEMORY AND COGNITION, 2010, 36 (02) :363-382

[6] Multitask learning [J].

Caruana, R .

MACHINE LEARNING, 1997, 28 (01) :41-75

[7] Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation [J].

Chen, Liang-Chieh ;

Zhu, Yukun ;

Papandreou, George ;

Schroff, Florian ;

Adam, Hartwig .

COMPUTER VISION - ECCV 2018, PT VII, 2018, 11211 :833-851

[8] Locality-Sensitive Deconvolution Networks with Gated Fusion for RGB-D Indoor Semantic Segmentation [J].

Cheng, Yanhua ;

Cai, Rui ;

Li, Zhiwei ;

Zhao, Xin ;

Huang, Kaiqi .

30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :1475-1483

[9]

Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848

[10] Semantic Segmentation of RGBD Images with Mutex Constraints [J].

Deng, Zhuo ;

Todorovic, Sinisa ;

Latecki, Longin Jan .

2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :1733-1741

← 1 2 3 4 5 6 7 8 →