WaveNet: Wavelet Network With Knowledge Distillation for RGB-T Salient Object Detection

被引:62
作者
Zhou, Wujie [1 ]
Sun, Fan [1 ,2 ]
Jiang, Qiuping [3 ]
Cong, Runmin [4 ]
Hwang, Jenq-Neng [5 ]
机构
[1] Zhejiang Univ Sci & Technol, Sch Informat & Elect Engn, Hangzhou 310023, Peoples R China
[2] Nanyang Technol Univ, Sch Comp Sci & Engn, Singapore 308232, Singapore
[3] Ningbo Univ, Sch Informat Sci & Engn, Ningbo 315211, Peoples R China
[4] Shandong Univ, Sch Control Sci & Engn, Jinan, Peoples R China
[5] Univ Washington, Dept Elect Engn, Seattle, WA 98105 USA
基金
中国国家自然科学基金;
关键词
Transformers; Feature extraction; Discrete wavelet transforms; Training; Knowledge engineering; Cross layer design; Convolutional neural networks; Wavelet; knowledge distillation; discrete wavelet transform; progressively stretched sine-cosine module; edge-aware module; FUSION; IMAGE;
D O I
10.1109/TIP.2023.3275538
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In recent years, various neural network architectures for computer vision have been devised, such as the visual transformer and multilayer perceptron (MLP). A transformer based on an attention mechanism can outperform a traditional convolutional neural network. Compared with the convolutional neural network and transformer, the MLP introduces less inductive bias and achieves stronger generalization. In addition, a transformer shows an exponential increase in the inference, training, and debugging times. Considering a wave function representation, we propose the WaveNet architecture that adopts a novel vision task-oriented wavelet-based MLP for feature extraction to perform salient object detection in RGB (red-green-blue)-thermal infrared images. In addition, we apply knowledge distillation to a transformer as an advanced teacher network to acquire rich semantic and geometric information and guide WaveNet learning with this information. Following the shortestpath concept, we adopt the Kullback-Leibler distance as a regularization term for the RGB features to be as similar to the thermal infrared features as possible. The discrete wavelet transform allows for the examination of frequency-domain features in a local time domain and time-domain features in a local frequency domain. We apply this representation ability to perform cross-modality feature fusion. Specifically, we introduce a progressively cascaded sine-cosine module for cross-layer feature fusion and use low-level features to obtain clear boundaries of salient objects through the MLP. Results from extensive experiments indicate that the proposed WaveNet achieves impressive performance on benchmark RGB-thermal infrared datasets. The results and code are publicly available at https://github.com/nowander/WaveNet.
引用
收藏
页码:3027 / 3039
页数:13
相关论文
共 89 条
  • [1] Achanta R, 2009, PROC CVPR IEEE, P1597, DOI 10.1109/CVPRW.2009.5206596
  • [2] Context-aware saliency detection for image retargeting using convolutional neural networks
    Ahmadi, Mahdi
    Karimi, Nader
    Samavi, Shadrokh
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2021, 80 (08) : 11917 - 11941
  • [3] [Anonymous], 2019, IEEE T IMAGE PROCESS, DOI DOI 10.1109/TIP.2019.2891104
  • [4] Salient Object Detection: A Benchmark
    Borji, Ali
    Cheng, Ming-Ming
    Jiang, Huaizu
    Li, Jia
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2015, 24 (12) : 5706 - 5722
  • [5] DHFNet: dual-decoding hierarchical fusion network for RGB-thermal semantic segmentation
    Cai, Yuqi
    Zhou, Wujie
    Zhang, Liting
    Yu, Lu
    Luo, Ting
    [J]. VISUAL COMPUTER, 2024, 40 (01) : 169 - 179
  • [6] Multi-modal fusion network with multi-scale multi-path and cross-modal interactions for RGB-D salient object detection
    Chen, Hao
    Li, Youfu
    Su, Dan
    [J]. PATTERN RECOGNITION, 2019, 86 : 376 - 385
  • [7] DPANet: Depth Potentiality-Aware Gated Attention Network for RGB-D Salient Object Detection
    Chen, Zuyao
    Cong, Runmin
    Xu, Qianqian
    Huang, Qingming
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2021, 30 : 7012 - 7024
  • [8] A tutorial on the cross-entropy method
    De Boer, PT
    Kroese, DP
    Mannor, S
    Rubinstein, RY
    [J]. ANNALS OF OPERATIONS RESEARCH, 2005, 134 (01) : 19 - 67
  • [9] Dosovitskiy A, 2021, Arxiv, DOI arXiv:2010.11929
  • [10] Dumoulin V, 2018, Arxiv, DOI [arXiv:1603.07285, 10.48550/arXiv.1603.07285]