RFNet: Reverse Fusion Network With Attention Mechanism for RGB-D Indoor Scene Understanding

被引：8

作者：

Zhou, Wujie ^{[1
]}

Lv, Sijia ^{[1
]}

Lei, Jingsheng ^{[1
]}

Luo, Ting ^{[2
]}

Yu, Lu ^{[3
]}

机构：

[1] Zhejiang Univ Sci & Technol, Sch Informat & Elect Engn, Hangzhou 310023, Peoples R China

[2] Ningbo Univ, Coll Sci & Technol, Ningbo 315211, Peoples R China

[3] Zhejiang Univ, Inst Informat & Commun Engn, Hangzhou 310027, Peoples R China

来源：

IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE | 2023年 / 7卷 / 02期

基金：

中国国家自然科学基金;

关键词：

Feature extraction; Semantics; Image segmentation; Sun; Computer architecture; Data mining; Computational intelligence; RGB-D; indoor scene understanding; reverse fusion network; attention mechanism;

D O I：

10.1109/TETCI.2022.3160720

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

RGB-D indoor multiclass scene understanding is a pixelwise task that interprets RGB-D images using depth information to improve the RGB features for higher performance. We propose a novel asymmetric encoder structure for RGB-D indoor scene understanding that uses a reverse fusion network (RFNet) with an attention mechanism and a simplified feature extraction block. Specifically, the pre-trained ResNet34 and VGG16 networks (two asymmetric input streams) are used as the backbone for the information extraction paths as well as additive fusion and attention modules that further enhance network performance. The strong feature extraction ability of classical networks and the advantages of two-way reverse fusion enable this novel semantic segmentation network to narrow the gap between low- and high-level features, such that the features are better merged for segmentation. We achieved segmentation performances (MIoU) of 53.5% and 50.7% on the SUN RGB-D and NYUDv2 datasets, respectively, thereby outperforming other state-of-the-art approaches.

引用

页码：598 / 603

页数：6

共 46 条

[1] SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation
Badrinarayanan, Vijay
Kendall, Alex
Cipolla, Roberto
[J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2017, 39 (12) : 2481 - 2495
[2] DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs
Chen, Liang-Chieh
Papandreou, George
Kokkinos, Iasonas
Murphy, Kevin
Yuille, Alan L.
[J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2018, 40 (04) : 834 - 848
[3] Deng L., 2019, RFBNET DEEP MULTIMOD
[4] Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-Scale Convolutional Architecture
Eigen, David
Fergus, Rob
[J]. 2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, : 2650 - 2658
[5] The PASCAL Visual Object Classes Challenge: A Retrospective
Everingham, Mark
Eslami, S. M. Ali
Van Gool, Luc
Williams, Christopher K. I.
Winn, John
Zisserman, Andrew
[J]. INTERNATIONAL JOURNAL OF COMPUTER VISION, 2015, 111 (01) : 98 - 136
[6] Learning Rich Features from RGB-D Images for Object Detection and Segmentation
Gupta, Saurabh
Girshick, Ross
Arbelaez, Pablo
Malik, Jitendra
[J]. COMPUTER VISION - ECCV 2014, PT VII, 2014, 8695 : 345 - 360
[7] FuseNet: Incorporating Depth into Semantic Segmentation via Fusion-Based CNN Architecture
Hazirbas, Caner
Ma, Lingni
Domokos, Csaba
Cremers, Daniel
[J]. COMPUTER VISION - ACCV 2016, PT I, 2017, 10111 : 213 - 228
[8] He K., 2016, 2016 IEEE C COMP VIS, DOI [DOI 10.1109/CVPR.2016.90, 10.1109/CVPR.2016.90]
[9] STD2P: RGBD Semantic Segmentation using Spatio-Temporal Data-Driven Pooling
He, Yang
Chiu, Wei-Chen
Keuper, Margret
Fritz, Mario
[J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 7158 - 7167
[10] Evaluation of Multimodal Semantic Segmentation using RGB-D Data
Hu, Jiesi
Zhao, Ganning
You, Suya
Kuo, C. C. Jay
[J]. ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING FOR MULTI-DOMAIN OPERATIONS APPLICATIONS III, 2021, 11746

← 1 2 3 4 5 →