TSNet: Three-Stream Self-Attention Network for RGB-D Indoor Semantic Segmentation

被引：114

作者：

Zhou, Wujie ^{[1
]}

Yuan, Jianzhong ^{[1
]}

Lei, Jingsheng ^{[1
]}

Luo, Ting ^{[2
]}

机构：

[1] Zhejiang Univ Sci & Technol, Hangzhou 310023, Peoples R China

[2] Ningbo Univ, Ningbo 315211, Peoples R China

来源：

IEEE INTELLIGENT SYSTEMS | 2021年 / 36卷 / 04期

基金：

中国国家自然科学基金; 中国博士后科学基金;

关键词：

Indoor semantic segmentation; Rgb-d; Self-attention; Three-stream;

D O I：

10.1109/MIS.2020.2999462

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This article proposes a three-stream self-attention network (TSNet) for indoor semantic segmentation comprising two asymmetric input streams (asymmetric encoder structure) and a cross-modal distillation stream with a self-attention module. The two asymmetric input streams are ResNet34 for the red-green-blue (RGB) stream and VGGNet16 for the depth stream. Accompanying the RGB and depth streams, a cross-modal distillation stream with a self- attention module extracts new RGB plus depth features in each level in the bottom-up path. In addition, while using bilinear upsampling to recover the spatial resolution of the feature map, we incorporated the feature information of both the RGB flow and the depth flow through the self-attention module. We constructed the NYU Depth V2 dataset to evaluate the TSNet and achieved results comparable to those of current state-of-the-art methods.

引用

页码：73 / 78

页数：6

共 10 条

[1] Three-Stream Attention-Aware Network for RGB-D Salient Object Detection [J].

Chen, Hao ;

Li, Youfu .

IEEE TRANSACTIONS ON IMAGE PROCESSING, 2019, 28 (06) :2825-2835

[2] Learning Rich Features from RGB-D Images for Object Detection and Segmentation [J].

Gupta, Saurabh ;

Girshick, Ross ;

Arbelaez, Pablo ;

Malik, Jitendra .

COMPUTER VISION - ECCV 2014, PT VII, 2014, 8695 :345-360

[3] STD2P: RGBD Semantic Segmentation using Spatio-Temporal Data-Driven Pooling [J].

He, Yang ;

Chiu, Wei-Chen ;

Keuper, Margret ;

Fritz, Mario .

30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :7158-7167

[4] RGB-D joint modelling with scene geometric information for indoor semantic segmentation [J].

Liu, Hong ;

Wu, Wenshan ;

Wang, Xiangdong ;

Qian, Yueliang .

MULTIMEDIA TOOLS AND APPLICATIONS, 2018, 77 (17) :22475-22488

[5] Incremental Class Discovery for Semantic Segmentation with RGBD Sensing [J].

Nakajima, Yoshikatsu ;

Kang, Byeongkeun ;

Saito, Hideo ;

Kitani, Kris .

2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :972-981

[6] Fully Convolutional Networks for Semantic Segmentation [J].

Shelhamer, Evan ;

Long, Jonathan ;

Darrell, Trevor .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2017, 39 (04) :640-651

[7] Indoor Segmentation and Support Inference from RGBD Images [J].

Silberman, Nathan ;

Hoiem, Derek ;

Kohli, Pushmeet ;

Fergus, Rob .

COMPUTER VISION - ECCV 2012, PT V, 2012, 7576 :746-760

[8] Depth-Aware CNN for RGB-D Segmentation [J].

Wang, Weiyue ;

Neumann, Ulrich .

COMPUTER VISION - ECCV 2018, PT XI, 2018, 11215 :144-161

[9]

Yuan J., 2018, PROC SPIE, P10615

[10] Joint Task-Recursive Learning for RGB-D Scene Understanding [J].

Zhang, Zhenyu ;

Cui, Zhen ;

Xu, Chunyan ;

Jie, Zequn ;

Li, Xiang ;

Yang, Jian .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2020, 42 (10) :2608-2623

← 1 →