Joint Task-Recursive Learning for RGB-D Scene Understanding

被引:15
作者
Zhang, Zhenyu [1 ,2 ]
Cui, Zhen [1 ,2 ]
Xu, Chunyan [1 ,2 ]
Jie, Zequn [3 ]
Li, Xiang [1 ,2 ]
Yang, Jian [1 ,2 ]
机构
[1] Nanjing Univ Sci & Technol, Sch Comp Sci & Engn, PCA Lab,Minist Educ, Key Lab Intelligent Percept & Syst High Dimens In, Nanjing 210094, Peoples R China
[2] Nanjing Univ Sci & Technol, Sch Comp Sci & Engn, Jiangsu Key Lab Image & Video Understanding Socia, Nanjing 210094, Peoples R China
[3] Tencent AI Lab, Nanjing 210094, Peoples R China
关键词
Task analysis; Estimation; Semantics; Image segmentation; Learning systems; Fuses; Cameras; Depth estimation; surface normal estimation; semantic segmentation; recursive learning; RGB-D scene understanding;
D O I
10.1109/TPAMI.2019.2926728
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
RGB-D scene understanding under monocular camera is an emerging and challenging topic with many potential applications. In this paper, we propose a novel Task-Recursive Learning (TRL) framework to jointly and recurrently conduct three representative tasks therein containing depth estimation, surface normal prediction and semantic segmentation. TRL recursively refines the prediction results through a series of task-level interactions, where one-time cross-task interaction is abstracted as one network block of one time stage. In each stage, we serialize multiple tasks into a sequence and then recursively perform their interactions. To adaptively enhance counterpart patterns, we encapsulate interactions into a specific Task-Attentional Module (TAM) to mutually-boost the tasks from each other. Across stages, the historical experiences of previous states of tasks are selectively propagated into the next stages by using Feature-Selection unit (FS-Unit), which takes advantage of complementary information across tasks. The sequence of task-level interactions is also evolved along a coarse-to-fine scale space such that the required details may be refined progressively. Finally the task-abstracted sequence problem of multi-task prediction is framed into a recursive network. Extensive experiments on NYU-Depth v2 and SUN RGB-D datasets demonstrate that our method can recursively refines the results of the triple tasks and achieves state-of-the-art performance.
引用
收藏
页码:2608 / 2623
页数:16
相关论文
共 71 条
[1]  
[Anonymous], 2014, INT C LEARN REPR ICL
[2]  
[Anonymous], 2010, ADV NEURAL INFORM PR
[3]  
[Anonymous], 2007, P 24 INT C MACHINE L
[4]   Marr Revisited: 2D-3D Alignment via Surface Normal Prediction [J].
Bansal, Aayush ;
Russell, Bryan ;
Gupta, Abhinav .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :5965-5974
[5]   The Problem State: A Cognitive Bottleneck in Multitasking [J].
Borst, Jelmer P. ;
Taatgen, Niels A. ;
van Rijn, Hedderik .
JOURNAL OF EXPERIMENTAL PSYCHOLOGY-LEARNING MEMORY AND COGNITION, 2010, 36 (02) :363-382
[6]   Multitask learning [J].
Caruana, R .
MACHINE LEARNING, 1997, 28 (01) :41-75
[7]   Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation [J].
Chen, Liang-Chieh ;
Zhu, Yukun ;
Papandreou, George ;
Schroff, Florian ;
Adam, Hartwig .
COMPUTER VISION - ECCV 2018, PT VII, 2018, 11211 :833-851
[8]   Locality-Sensitive Deconvolution Networks with Gated Fusion for RGB-D Indoor Semantic Segmentation [J].
Cheng, Yanhua ;
Cai, Rui ;
Li, Zhiwei ;
Zhao, Xin ;
Huang, Kaiqi .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :1475-1483
[9]  
Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848
[10]   Semantic Segmentation of RGBD Images with Mutex Constraints [J].
Deng, Zhuo ;
Todorovic, Sinisa ;
Latecki, Longin Jan .
2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :1733-1741