Lightweight Dual Stream Network With Knowledge Distillation for RGB-D Scene Parsing

被引：4

作者：

Zhang, Yuming ^{[1
]}

Zhou, Wujie ^{[1
]}

Ran, Xiaoxiao ^{[2
]}

Fang, Meixin ^{[1
]}

机构：

[1] Zhejiang Univ Sci & Technol, Sch Informat & Elect Engn, Hangzhou 310023, Peoples R China

[2] COMAC Shanghai Aircraft Mfg Co Ltd, 5G Innovat Ctr, Shanghai 201324, Peoples R China

来源：

IEEE SIGNAL PROCESSING LETTERS | 2024年 / 31卷

基金：

中国国家自然科学基金;

关键词：

RGB-D scene parsing; intergrated joint enhancement module; auxiliary extraction module; knowledge distillation;

D O I：

10.1109/LSP.2024.3378120

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Significant progress has been made in the field of indoor scene parsing. The increasing demand for lightweight networks is due to the limited hardware capacity of mobile devices. However, there has been a lack of research on the design of lightweight networks for indoor scene parsing. Therefore, we propose lightweight dual stream network (LDSNet) with knowledge distillation (KD) for RGB-D indoor scene parsing. Initially, we developed a two-stream network with three versions (LDSNet-tiny*, LDSNet-small*, and LDSNet-base, where* represents the model after KD) for different scenarios. In the main stream, we designed an integrated joint enhancement module that captures valuable information from both RGB and depth features. This information is then processed by the cascading integration module to generate the final map. To improve the performance of the model, we included an auxiliary extraction module in the auxiliary stream to specifically extract feature information for KD. During the training process, we used hierarchical context loss to distill features and obtain LDSNet-tiny* and LDSNet-small*. We conducted experiments on the NYUDv2 and SUN RGB-D datasets, which demonstrated that our LDSNet-base achieves superior results, while LDSNet-tiny* and LDSNet-small* also exhibit satisfactory performance.

引用

页码：855 / 859

页数：5

共 36 条

[1] ShapeConv: Shape-aware Convolutional Layer for Indoor RGB-D Semantic Segmentation [J].

Cao, Jinming ;

Leng, Hanchao ;

Lischinski, Dani ;

Cohen-Or, Danny ;

Tu, Changhe ;

Li, Yangyan .

2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :7068-7077

[2] Be an Excellent Student: Review, Preview, and Correction [J].

Cao, Qizhi ;

Zhang, Kaibing ;

He, Xin ;

Shen, Junge .

IEEE SIGNAL PROCESSING LETTERS, 2023, 30 :1722-1726

[3] Spatial Information Guided Convolution for Real-Time RGBD Semantic Segmentation [J].

Chen, Lin-Zhuo ;

Lin, Zheng ;

Wang, Ziqin ;

Yang, Yong-Liang ;

Cheng, Ming-Ming .

IEEE TRANSACTIONS ON IMAGE PROCESSING, 2021, 30 :2313-2324

[4] Distilling Knowledge via Knowledge Review [J].

Chen, Pengguang ;

Liu, Shu ;

Zhao, Hengshuang ;

Jia, Jiaya .

2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :5006-5015

[5] Bi-directional Cross-Modality Feature Propagation with Separation-and-Aggregation Gate for RGB-D Semantic Segmentation [J].

Chen, Xiaokang ;

Lin, Kwan-Yee ;

Wang, Jingbo ;

Wu, Wayne ;

Qian, Chen ;

Li, Hongsheng ;

Zeng, Gang .

COMPUTER VISION - ECCV 2020, PT XI, 2020, 12356 :561-577

[6] FuseNet: Incorporating Depth into Semantic Segmentation via Fusion-Based CNN Architecture [J].

Hazirbas, Caner ;

Ma, Lingni ;

Domokos, Csaba ;

Cremers, Daniel .

COMPUTER VISION - ACCV 2016, PT I, 2017, 10111 :213-228

[7] Deep Residual Learning for Image Recognition [J].

He, Kaiming ;

Zhang, Xiangyu ;

Ren, Shaoqing ;

Sun, Jian .

2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :770-778

[8]

Hinton G, 2015, Arxiv, DOI [arXiv:1503.02531, DOI 10.48550/ARXIV.1503.02531]

[9]

Hu J, 2018, PROC CVPR IEEE, P7132, DOI [10.1109/CVPR.2018.00745, 10.1109/TPAMI.2019.2913372]

[10]

Hu XX, 2019, IEEE IMAGE PROC, P1440, DOI [10.1109/icip.2019.8803025, 10.1109/ICIP.2019.8803025]

← 1 2 3 4 →