Enhancing RGB-D Mirror Segmentation With a Neighborhood-Matching and Demand-Modal Adaptive Network Using Knowledge Distillation

被引：0

作者：

Zhou, Wujie ^{[1
]}

Zhang, Han ^{[1
]}

Liu, Yuanyuan ^{[2
,3
]}

Luo, Ting ^{[4
]}

机构：

[1] Zhejiang Univ Sci & Technol, Sch Informat & Elect Engn, Hangzhou 310023, Peoples R China

[2] China Univ Geosci, Sch Comp & Technol, Wuhan 430074, Peoples R China

[3] Nanyang Technol Univ, Coll Comp & Data Sci, Singapore 308232, Singapore

[4] Ningbo Univ, Coll Sci & Technol, Ningbo 315300, Peoples R China

来源：

IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING | 2025年 / 22卷

基金：

中国国家自然科学基金;

关键词：

Mirrors; Computational modeling; Computer vision; Semantic segmentation; Complexity theory; Image segmentation; Semantics; Knowledge transfer; Adaptation models; Noise; Mirror segmentation; knowledge distillation; sample complexity rater; multilevel distillation; SALIENT OBJECT DETECTION; SEMANTIC SEGMENTATION;

D O I：

10.1109/TASE.2025.3547613

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Recent breakthroughs in computer vision have led to remarkable progress in the areas of autonomous vehicles and robotics. However, ordinary objects such as mirrors pose unique challenges to computer vision systems owing to occlusion, reflection, and distortion. Moreover, existing deep learning models suffer from issues such as excessive parameters and high computational complexity, making it challenging to implement numerous studies offline. To address these issues, we propose an innovative solution: a neighborhood-matching and demand-modal adaptive network using knowledge distillation (KD), called NDANet-S*, specifically designed for red-green-blue depth mirror segmentation. NDANet-S* operates by iteratively matching detailed and semantic difference between neighborhood features during the encoding phase. It then complements information across different modalities through demand-modal adaptation, enhancing heteromodal cross-complementation during the KD stage. In the decoding phase, semantic enhancement features and iterative encoding features are deeply integrated, forming a strong foundation for multistage progressive knowledge transfer in the KD process. Furthermore, we introduce a multistage teacher-assisted KD scheme, guided by sample complexity, to work synergistically with the mirror segmentation model. This innovative scheme includes a sample complexity rater, heterogeneous cross-complementarity, and hierarchical progressive knowledge transfer. Experimental evaluations on publicly available datasets indicate that NDANet-S* significantly enhances segmentation accuracy while preserving a consistent number of parameters. Additionally, it achieves state-of-the-art performance in mirror segmentation. The source code for our model is publicly available and can be accessed at: https://github.com/2021nihao/NMDANet.

引用

页码：12679 / 12692

页数：14

共 76 条

[1] Variational Information Distillation for Knowledge Transfer
Ahn, Sungsoo
Hu, Shell Xu
Damianou, Andreas
Lawrence, Neil D.
Dai, Zhenwen
[J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 9155 - 9163
[2] Armeni I., 2017, arXiv
[3] Matterport3D: Learning from RGB-D Data in Indoor Environments
Chang, Angel
Dai, Angela
Funkhouser, Thomas
Halber, Maciej
Niessner, Matthias
Savva, Manolis
Song, Shuran
Zeng, Andy
Zhang, Yinda
[J]. PROCEEDINGS 2017 INTERNATIONAL CONFERENCE ON 3D VISION (3DV), 2017, : 667 - 676
[4] Spatial Information Guided Convolution for Real-Time RGBD Semantic Segmentation
Chen, Lin-Zhuo
Lin, Zheng
Wang, Ziqin
Yang, Yong-Liang
Cheng, Ming-Ming
[J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2021, 30 : 2313 - 2324
[5] Identity-Aware Contrastive Knowledge Distillation for Facial Attribute Recognition
Chen, Si
Zhu, Xueyan
Yan, Yan
Zhu, Shunzhi
Li, Shao-Zi
Wang, Da-Han
[J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (10) : 5692 - 5706
[6] CIR-Net: Cross-Modality Interaction and Refinement for RGB-D Salient Object Detection
Cong, Runmin
Lin, Qinwei
Zhang, Chen
Li, Chongyi
Cao, Xiaochun
Huang, Qingming
Zhao, Yao
[J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 6800 - 6815
[7] ScanNet: Richly-annotated 3D Reconstructions of Indoor Scenes
Dai, Angela
Chang, Angel X.
Savva, Manolis
Halber, Maciej
Funkhouser, Thomas
Niessner, Matthias
[J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 2432 - 2443
[8] A tutorial on the cross-entropy method
De Boer, PT
Kroese, DP
Mannor, S
Rubinstein, RY
[J]. ANNALS OF OPERATIONS RESEARCH, 2005, 134 (01) : 19 - 67
[9] Deng-Ping Fan, 2020, Computer Vision - ECCV 2020. 16th European Conference. Proceedings. Lecture Notes in Computer Science (LNCS 12357), P275, DOI 10.1007/978-3-030-58610-2_17
[10] Semantic Segmentation With Context Encoding and Multi-Path Decoding
Ding, Henghui
Jiang, Xudong
Shuai, Bing
Liu, Ai Qun
Wang, Gang
[J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2020, 29 : 3520 - 3533

← 1 2 3 4 5 6 7 8 →