RFNet: Reverse Fusion Network With Attention Mechanism for RGB-D Indoor Scene Understanding

被引:8
作者
Zhou, Wujie [1 ]
Lv, Sijia [1 ]
Lei, Jingsheng [1 ]
Luo, Ting [2 ]
Yu, Lu [3 ]
机构
[1] Zhejiang Univ Sci & Technol, Sch Informat & Elect Engn, Hangzhou 310023, Peoples R China
[2] Ningbo Univ, Coll Sci & Technol, Ningbo 315211, Peoples R China
[3] Zhejiang Univ, Inst Informat & Commun Engn, Hangzhou 310027, Peoples R China
来源
IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE | 2023年 / 7卷 / 02期
基金
中国国家自然科学基金;
关键词
Feature extraction; Semantics; Image segmentation; Sun; Computer architecture; Data mining; Computational intelligence; RGB-D; indoor scene understanding; reverse fusion network; attention mechanism;
D O I
10.1109/TETCI.2022.3160720
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
RGB-D indoor multiclass scene understanding is a pixelwise task that interprets RGB-D images using depth information to improve the RGB features for higher performance. We propose a novel asymmetric encoder structure for RGB-D indoor scene understanding that uses a reverse fusion network (RFNet) with an attention mechanism and a simplified feature extraction block. Specifically, the pre-trained ResNet34 and VGG16 networks (two asymmetric input streams) are used as the backbone for the information extraction paths as well as additive fusion and attention modules that further enhance network performance. The strong feature extraction ability of classical networks and the advantages of two-way reverse fusion enable this novel semantic segmentation network to narrow the gap between low- and high-level features, such that the features are better merged for segmentation. We achieved segmentation performances (MIoU) of 53.5% and 50.7% on the SUN RGB-D and NYUDv2 datasets, respectively, thereby outperforming other state-of-the-art approaches.
引用
收藏
页码:598 / 603
页数:6
相关论文
共 46 条
  • [1] SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation
    Badrinarayanan, Vijay
    Kendall, Alex
    Cipolla, Roberto
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2017, 39 (12) : 2481 - 2495
  • [2] DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs
    Chen, Liang-Chieh
    Papandreou, George
    Kokkinos, Iasonas
    Murphy, Kevin
    Yuille, Alan L.
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2018, 40 (04) : 834 - 848
  • [3] Deng L., 2019, RFBNET DEEP MULTIMOD
  • [4] Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-Scale Convolutional Architecture
    Eigen, David
    Fergus, Rob
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, : 2650 - 2658
  • [5] The PASCAL Visual Object Classes Challenge: A Retrospective
    Everingham, Mark
    Eslami, S. M. Ali
    Van Gool, Luc
    Williams, Christopher K. I.
    Winn, John
    Zisserman, Andrew
    [J]. INTERNATIONAL JOURNAL OF COMPUTER VISION, 2015, 111 (01) : 98 - 136
  • [6] Learning Rich Features from RGB-D Images for Object Detection and Segmentation
    Gupta, Saurabh
    Girshick, Ross
    Arbelaez, Pablo
    Malik, Jitendra
    [J]. COMPUTER VISION - ECCV 2014, PT VII, 2014, 8695 : 345 - 360
  • [7] FuseNet: Incorporating Depth into Semantic Segmentation via Fusion-Based CNN Architecture
    Hazirbas, Caner
    Ma, Lingni
    Domokos, Csaba
    Cremers, Daniel
    [J]. COMPUTER VISION - ACCV 2016, PT I, 2017, 10111 : 213 - 228
  • [8] He K., 2016, 2016 IEEE C COMP VIS, DOI [DOI 10.1109/CVPR.2016.90, 10.1109/CVPR.2016.90]
  • [9] STD2P: RGBD Semantic Segmentation using Spatio-Temporal Data-Driven Pooling
    He, Yang
    Chiu, Wei-Chen
    Keuper, Margret
    Fritz, Mario
    [J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 7158 - 7167
  • [10] Evaluation of Multimodal Semantic Segmentation using RGB-D Data
    Hu, Jiesi
    Zhao, Ganning
    You, Suya
    Kuo, C. C. Jay
    [J]. ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING FOR MULTI-DOMAIN OPERATIONS APPLICATIONS III, 2021, 11746