WaveNet: Wavelet Network With Knowledge Distillation for RGB-T Salient Object Detection

被引:62
作者
Zhou, Wujie [1 ]
Sun, Fan [1 ,2 ]
Jiang, Qiuping [3 ]
Cong, Runmin [4 ]
Hwang, Jenq-Neng [5 ]
机构
[1] Zhejiang Univ Sci & Technol, Sch Informat & Elect Engn, Hangzhou 310023, Peoples R China
[2] Nanyang Technol Univ, Sch Comp Sci & Engn, Singapore 308232, Singapore
[3] Ningbo Univ, Sch Informat Sci & Engn, Ningbo 315211, Peoples R China
[4] Shandong Univ, Sch Control Sci & Engn, Jinan, Peoples R China
[5] Univ Washington, Dept Elect Engn, Seattle, WA 98105 USA
基金
中国国家自然科学基金;
关键词
Transformers; Feature extraction; Discrete wavelet transforms; Training; Knowledge engineering; Cross layer design; Convolutional neural networks; Wavelet; knowledge distillation; discrete wavelet transform; progressively stretched sine-cosine module; edge-aware module; FUSION; IMAGE;
D O I
10.1109/TIP.2023.3275538
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In recent years, various neural network architectures for computer vision have been devised, such as the visual transformer and multilayer perceptron (MLP). A transformer based on an attention mechanism can outperform a traditional convolutional neural network. Compared with the convolutional neural network and transformer, the MLP introduces less inductive bias and achieves stronger generalization. In addition, a transformer shows an exponential increase in the inference, training, and debugging times. Considering a wave function representation, we propose the WaveNet architecture that adopts a novel vision task-oriented wavelet-based MLP for feature extraction to perform salient object detection in RGB (red-green-blue)-thermal infrared images. In addition, we apply knowledge distillation to a transformer as an advanced teacher network to acquire rich semantic and geometric information and guide WaveNet learning with this information. Following the shortestpath concept, we adopt the Kullback-Leibler distance as a regularization term for the RGB features to be as similar to the thermal infrared features as possible. The discrete wavelet transform allows for the examination of frequency-domain features in a local time domain and time-domain features in a local frequency domain. We apply this representation ability to perform cross-modality feature fusion. Specifically, we introduce a progressively cascaded sine-cosine module for cross-layer feature fusion and use low-level features to obtain clear boundaries of salient objects through the MLP. Results from extensive experiments indicate that the proposed WaveNet achieves impressive performance on benchmark RGB-thermal infrared datasets. The results and code are publicly available at https://github.com/nowander/WaveNet.
引用
收藏
页码:3027 / 3039
页数:13
相关论文
共 89 条
  • [31] Structured Knowledge Distillation for Semantic Segmentation
    Liu, Yifan
    Chen, Ke
    Liu, Chris
    Qin, Zengchang
    Luo, Zhenbo
    Wang, Jingdong
    [J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 2599 - 2608
  • [32] Liu Z, 2021, Arxiv, DOI [arXiv:2103.14030, DOI 10.48550/ARXIV.2103.14030, DOI 10.48550/ARXIV.2103.14030,ARXIV]
  • [33] TriTransNet: RGB-D Salient Object Detection with a Triplet Transformer Embedding Network
    Liu, Zhengyi
    Wang, Yuan
    Tu, Zhengzheng
    Xiao, Yun
    Tang, Bin
    [J]. PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 4481 - 4490
  • [34] SwinNet: Swin Transformer Drives Edge-Aware RGB-D and RGB-T Salient Object Detection
    Liu, Zhengyi
    Tan, Yacheng
    He, Qian
    Xiao, Yun
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (07) : 4486 - 4497
  • [35] A ConvNet for the 2020s
    Liu, Zhuang
    Mao, Hanzi
    Wu, Chao-Yuan
    Feichtenhofer, Christoph
    Darrell, Trevor
    Xie, Saining
    [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 11966 - 11976
  • [36] Adjacent Bi-Hierarchical Network for Scene Parsing of Remote Sensing Images
    Ma, Jiabao
    Zhou, Wujie
    Lei, Jingsheng
    Yu, Lu
    [J]. IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2023, 20
  • [37] A THEORY FOR MULTIRESOLUTION SIGNAL DECOMPOSITION - THE WAVELET REPRESENTATION
    MALLAT, SG
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1989, 11 (07) : 674 - 693
  • [38] DeepRoadMapper: Extracting Road Topology from Aerial Images
    Mattyus, Gelert
    Luo, Wenjie
    Urtasun, Raquel
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 3458 - 3466
  • [39] Dynamic Kernel Distillation for Efficient Pose Estimation in Videos
    Nie, Xuecheng
    Li, Yuncheng
    Luo, Linjie
    Zhang, Ning
    Feng, Jiashi
    [J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 6941 - 6949
  • [40] CAVER: Cross-Modal View-Mixed Transformer for Bi-Modal Salient Object Detection
    Pang, Youwei
    Zhao, Xiaoqi
    Zhang, Lihe
    Lu, Huchuan
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2023, 32 : 892 - 904