RTLNet: Recursive Triple-Path Learning Network for Scene Parsing of RGB-D Images

被引:4
|
作者
Yue, Yuchun [1 ]
Zhou, Wujie [1 ]
Lei, Jingsheng [1 ]
Yu, Lu [2 ]
机构
[1] Zhejiang Univ Sci & Technol, Sch Informat & Elect Engn, Hangzhou 310023, Peoples R China
[2] Zhejiang Univ, Coll Informat & Elect Engn, Hangzhou 310027, Peoples R China
基金
中国国家自然科学基金;
关键词
Image segmentation; Semantics; Decoding; Training; Streaming media; Sensors; Feature extraction; Scene parsing; cross-modality fusion; multiscale feature fusion; recursive learning; deep learning;
D O I
10.1109/LSP.2021.3139567
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Scene parsing approaches have attracted extensive attention in recent years; although several methods have been developed for scene parsing, most include complex modules for both cross-modality fusion between RGB and depth images in the encoder and image scale level recovery in the decoder under label supervision for high inference accuracy. Cross-modality information in the encoder may be diluted when processed through the decoder, and the supervision results may not be reused effectively, which adversely affects scene parsing. To address these problems, we propose a recursive triple-path learning network (RTLNet) for cross-modality interactions in the decoder using global context and cross-modality fusion modules. The proposed modules fully use cross-modality information to reduce information loss. To enhance the robustness of RTLNet, we add a path to reuse the initial predictions from the decoder and introduce a ladder-shaped feature consistency module to further leverage multiscale features. Experiments are conducted with the proposed RTLNet and nine recent RGB-D indoor scene parsing methods on the NYUv2 and SUN-RGBD indoor scene datasets; the results show that the RTLNet outperforms the other methods.
引用
收藏
页码:429 / 433
页数:5
相关论文
共 37 条
  • [31] Point Light Source Position Estimation From RGB-D Images by Learning Surface Attributes
    Karaoglu, Sezer
    Liu, Yang
    Gevers, Theo
    Smeulders, Arnold W. M.
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2017, 26 (11) : 5149 - 5159
  • [32] Fast Generation of High-Fidelity RGB-D Images by Deep Learning With Adaptive Convolution
    Xian, Chuhua
    Zhang, Dongjiu
    Dai, Chengkai
    Wang, Charlie C. L.
    IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, 2021, 18 (03) : 1328 - 1340
  • [33] Learning to Detect 3D Symmetry From Single-View RGB-D Images With Weak Supervision
    Shi, Yifei
    Xu, Xin
    Xi, Junhua
    Hu, Xiaochang
    Hu, Dewen
    Xu, Kai
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (04) : 4882 - 4896
  • [34] An improved dense-to-sparse cross-modal fusion network for 3D object detection in RGB-D images
    Chen, Yan
    Ni, Jianjun
    Tang, Guangyi
    Cao, Weidong
    Yang, Simon X.
    MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (04) : 12159 - 12184
  • [35] An improved dense-to-sparse cross-modal fusion network for 3D object detection in RGB-D images
    Yan Chen
    Jianjun Ni
    Guangyi Tang
    Weidong Cao
    Simon X. Yang
    Multimedia Tools and Applications, 2024, 83 : 12159 - 12184
  • [36] An End-to-End Deep Learning Network for 3D Object Detection From RGB-D Data Based on Hough Voting
    Yan, Ming
    Li, Zhongtong
    Yu, Xinyan
    Jin, Cong
    IEEE ACCESS, 2020, 8 : 138810 - 138822
  • [37] RGB-D Depth-sensor-based Hand Gesture Recognition Using Deep Learning of Depth Images with Shadow Effect Removal for Smart Gesture Communication
    Ding, Ing-, Jr.
    Zheng, Nai-Wei
    SENSORS AND MATERIALS, 2022, 34 (01) : 203 - 216