Enhancing RGB-D Mirror Segmentation With a Neighborhood-Matching and Demand-Modal Adaptive Network Using Knowledge Distillation

被引:0
作者
Zhou, Wujie [1 ]
Zhang, Han [1 ]
Liu, Yuanyuan [2 ,3 ]
Luo, Ting [4 ]
机构
[1] Zhejiang Univ Sci & Technol, Sch Informat & Elect Engn, Hangzhou 310023, Peoples R China
[2] China Univ Geosci, Sch Comp & Technol, Wuhan 430074, Peoples R China
[3] Nanyang Technol Univ, Coll Comp & Data Sci, Singapore 308232, Singapore
[4] Ningbo Univ, Coll Sci & Technol, Ningbo 315300, Peoples R China
基金
中国国家自然科学基金;
关键词
Mirrors; Computational modeling; Computer vision; Semantic segmentation; Complexity theory; Image segmentation; Semantics; Knowledge transfer; Adaptation models; Noise; Mirror segmentation; knowledge distillation; sample complexity rater; multilevel distillation; SALIENT OBJECT DETECTION; SEMANTIC SEGMENTATION;
D O I
10.1109/TASE.2025.3547613
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Recent breakthroughs in computer vision have led to remarkable progress in the areas of autonomous vehicles and robotics. However, ordinary objects such as mirrors pose unique challenges to computer vision systems owing to occlusion, reflection, and distortion. Moreover, existing deep learning models suffer from issues such as excessive parameters and high computational complexity, making it challenging to implement numerous studies offline. To address these issues, we propose an innovative solution: a neighborhood-matching and demand-modal adaptive network using knowledge distillation (KD), called NDANet-S*, specifically designed for red-green-blue depth mirror segmentation. NDANet-S* operates by iteratively matching detailed and semantic difference between neighborhood features during the encoding phase. It then complements information across different modalities through demand-modal adaptation, enhancing heteromodal cross-complementation during the KD stage. In the decoding phase, semantic enhancement features and iterative encoding features are deeply integrated, forming a strong foundation for multistage progressive knowledge transfer in the KD process. Furthermore, we introduce a multistage teacher-assisted KD scheme, guided by sample complexity, to work synergistically with the mirror segmentation model. This innovative scheme includes a sample complexity rater, heterogeneous cross-complementarity, and hierarchical progressive knowledge transfer. Experimental evaluations on publicly available datasets indicate that NDANet-S* significantly enhances segmentation accuracy while preserving a consistent number of parameters. Additionally, it achieves state-of-the-art performance in mirror segmentation. The source code for our model is publicly available and can be accessed at: https://github.com/2021nihao/NMDANet.
引用
收藏
页码:12679 / 12692
页数:14
相关论文
共 76 条
  • [1] Variational Information Distillation for Knowledge Transfer
    Ahn, Sungsoo
    Hu, Shell Xu
    Damianou, Andreas
    Lawrence, Neil D.
    Dai, Zhenwen
    [J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 9155 - 9163
  • [2] Armeni I., 2017, arXiv
  • [3] Matterport3D: Learning from RGB-D Data in Indoor Environments
    Chang, Angel
    Dai, Angela
    Funkhouser, Thomas
    Halber, Maciej
    Niessner, Matthias
    Savva, Manolis
    Song, Shuran
    Zeng, Andy
    Zhang, Yinda
    [J]. PROCEEDINGS 2017 INTERNATIONAL CONFERENCE ON 3D VISION (3DV), 2017, : 667 - 676
  • [4] Spatial Information Guided Convolution for Real-Time RGBD Semantic Segmentation
    Chen, Lin-Zhuo
    Lin, Zheng
    Wang, Ziqin
    Yang, Yong-Liang
    Cheng, Ming-Ming
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2021, 30 : 2313 - 2324
  • [5] Identity-Aware Contrastive Knowledge Distillation for Facial Attribute Recognition
    Chen, Si
    Zhu, Xueyan
    Yan, Yan
    Zhu, Shunzhi
    Li, Shao-Zi
    Wang, Da-Han
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (10) : 5692 - 5706
  • [6] CIR-Net: Cross-Modality Interaction and Refinement for RGB-D Salient Object Detection
    Cong, Runmin
    Lin, Qinwei
    Zhang, Chen
    Li, Chongyi
    Cao, Xiaochun
    Huang, Qingming
    Zhao, Yao
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 6800 - 6815
  • [7] ScanNet: Richly-annotated 3D Reconstructions of Indoor Scenes
    Dai, Angela
    Chang, Angel X.
    Savva, Manolis
    Halber, Maciej
    Funkhouser, Thomas
    Niessner, Matthias
    [J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 2432 - 2443
  • [8] A tutorial on the cross-entropy method
    De Boer, PT
    Kroese, DP
    Mannor, S
    Rubinstein, RY
    [J]. ANNALS OF OPERATIONS RESEARCH, 2005, 134 (01) : 19 - 67
  • [9] Deng-Ping Fan, 2020, Computer Vision - ECCV 2020. 16th European Conference. Proceedings. Lecture Notes in Computer Science (LNCS 12357), P275, DOI 10.1007/978-3-030-58610-2_17
  • [10] Semantic Segmentation With Context Encoding and Multi-Path Decoding
    Ding, Henghui
    Jiang, Xudong
    Shuai, Bing
    Liu, Ai Qun
    Wang, Gang
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2020, 29 : 3520 - 3533