TSNet: Three-Stream Self-Attention Network for RGB-D Indoor Semantic Segmentation

被引:114
作者
Zhou, Wujie [1 ]
Yuan, Jianzhong [1 ]
Lei, Jingsheng [1 ]
Luo, Ting [2 ]
机构
[1] Zhejiang Univ Sci & Technol, Hangzhou 310023, Peoples R China
[2] Ningbo Univ, Ningbo 315211, Peoples R China
基金
中国国家自然科学基金; 中国博士后科学基金;
关键词
Indoor semantic segmentation; Rgb-d; Self-attention; Three-stream;
D O I
10.1109/MIS.2020.2999462
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This article proposes a three-stream self-attention network (TSNet) for indoor semantic segmentation comprising two asymmetric input streams (asymmetric encoder structure) and a cross-modal distillation stream with a self-attention module. The two asymmetric input streams are ResNet34 for the red-green-blue (RGB) stream and VGGNet16 for the depth stream. Accompanying the RGB and depth streams, a cross-modal distillation stream with a self- attention module extracts new RGB plus depth features in each level in the bottom-up path. In addition, while using bilinear upsampling to recover the spatial resolution of the feature map, we incorporated the feature information of both the RGB flow and the depth flow through the self-attention module. We constructed the NYU Depth V2 dataset to evaluate the TSNet and achieved results comparable to those of current state-of-the-art methods.
引用
收藏
页码:73 / 78
页数:6
相关论文
共 10 条
[1]   Three-Stream Attention-Aware Network for RGB-D Salient Object Detection [J].
Chen, Hao ;
Li, Youfu .
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2019, 28 (06) :2825-2835
[2]   Learning Rich Features from RGB-D Images for Object Detection and Segmentation [J].
Gupta, Saurabh ;
Girshick, Ross ;
Arbelaez, Pablo ;
Malik, Jitendra .
COMPUTER VISION - ECCV 2014, PT VII, 2014, 8695 :345-360
[3]   STD2P: RGBD Semantic Segmentation using Spatio-Temporal Data-Driven Pooling [J].
He, Yang ;
Chiu, Wei-Chen ;
Keuper, Margret ;
Fritz, Mario .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :7158-7167
[4]   RGB-D joint modelling with scene geometric information for indoor semantic segmentation [J].
Liu, Hong ;
Wu, Wenshan ;
Wang, Xiangdong ;
Qian, Yueliang .
MULTIMEDIA TOOLS AND APPLICATIONS, 2018, 77 (17) :22475-22488
[5]   Incremental Class Discovery for Semantic Segmentation with RGBD Sensing [J].
Nakajima, Yoshikatsu ;
Kang, Byeongkeun ;
Saito, Hideo ;
Kitani, Kris .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :972-981
[6]   Fully Convolutional Networks for Semantic Segmentation [J].
Shelhamer, Evan ;
Long, Jonathan ;
Darrell, Trevor .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2017, 39 (04) :640-651
[7]   Indoor Segmentation and Support Inference from RGBD Images [J].
Silberman, Nathan ;
Hoiem, Derek ;
Kohli, Pushmeet ;
Fergus, Rob .
COMPUTER VISION - ECCV 2012, PT V, 2012, 7576 :746-760
[8]   Depth-Aware CNN for RGB-D Segmentation [J].
Wang, Weiyue ;
Neumann, Ulrich .
COMPUTER VISION - ECCV 2018, PT XI, 2018, 11215 :144-161
[9]  
Yuan J., 2018, PROC SPIE, P10615
[10]   Joint Task-Recursive Learning for RGB-D Scene Understanding [J].
Zhang, Zhenyu ;
Cui, Zhen ;
Xu, Chunyan ;
Jie, Zequn ;
Li, Xiang ;
Yang, Jian .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2020, 42 (10) :2608-2623