Hierarchical Decoding Network Based on Swin Transformer for Detecting Salient Objects in RGB-T Images

被引:19
作者
Sun, Fan [1 ]
Zhou, Wujie [1 ]
Ye, Lv [1 ]
Yu, Lu [2 ]
机构
[1] Zhejiang Univ Sci & Technol, Sch Informat & Elect Engn, Hangzhou 310023, Peoples R China
[2] Zhejiang Univ, Coll Informat & Elect Engn, Hangzhou 310027, Peoples R China
基金
中国国家自然科学基金;
关键词
Feature extraction; Semantics; Decoding; Transformers; Convolution; Training; Image segmentation; Transformer; hierarchical decoder; semantic information guidance; sine-cosine fusion; global saliency per- ception; FUSION;
D O I
10.1109/LSP.2022.3194843
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Although conventional deep convolutional neural networks are effective for contextual semantic segmentation of objects, recent vision transformers can capture global information of an image and are better at capturing semantic associations over longer ranges. In addition, some existing saliency detection methods disregard the guidance of high-level semantic information for low-level features during decoding, and only use layer-by-layer transmission for encoding. Therefore, we propose a hierarchical decoding network based on a swin transformer to perform red-green-blue and thermal (RGB-T) salient object detection (SOD). First, a sine-cosine fusion module performs multimodality intersections and exploits complementarity. As a second fusion stage, an advanced semantic information guidance module adjusts high-level semantic information and low-level detailed characteristics. Finally, a global saliency perception module fuses cross-layer information in a top-down path. Comprehensive experiments demonstrate that the proposed network outperforms 12 state-of-the-art methods on three RGB-T SOD datasets.
引用
收藏
页码:1714 / 1718
页数:5
相关论文
共 39 条
[1]   Salient Object Detection: A Benchmark [J].
Borji, Ali ;
Sihite, Dicky N. ;
Itti, Laurent .
COMPUTER VISION - ECCV 2012, PT II, 2012, 7573 :414-429
[2]   Three-Stream Attention-Aware Network for RGB-D Salient Object Detection [J].
Chen, Hao ;
Li, Youfu .
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2019, 28 (06) :2825-2835
[3]   Multi-modal fusion network with multi-scale multi-path and cross-modal interactions for RGB-D salient object detection [J].
Chen, Hao ;
Li, Youfu ;
Su, Dan .
PATTERN RECOGNITION, 2019, 86 :376-385
[4]   JL-DCF: Joint Learning and Densely-Cooperative Fusion Framework for RGB-D Salient Object Detection [J].
Fu, Keren ;
Fan, Deng-Ping ;
Ji, Ge-Peng ;
Zhao, Qijun .
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, :3049-3059
[5]   Unified Information Fusion Network for Multi-Modal RGB-D and RGB-T Salient Object Detection [J].
Gao, Wei ;
Liao, Guibiao ;
Ma, Siwei ;
Li, Ge ;
Liang, Yongsheng ;
Lin, Weisi .
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (04) :2091-2106
[6]  
Kingma DP, 2014, ADV NEUR IN, V27
[7]  
Li C, 2018, PROCEEDINGS OF THE 2018 USENIX ANNUAL TECHNICAL CONFERENCE, P359
[8]   Hierarchical Alternate Interaction Network for RGB-D Salient Object Detection [J].
Li, Gongyang ;
Liu, Zhi ;
Chen, Minyu ;
Bai, Zhen ;
Lin, Weisi ;
Ling, Haibin .
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2021, 30 :3528-3542
[9]   Cross-Modal Weighting Network for RGB-D Salient Object Detection [J].
Li, Gongyang ;
Liu, Zhi ;
Ye, Linwei ;
Wang, Yang ;
Ling, Haibin .
COMPUTER VISION - ECCV 2020, PT XVII, 2020, 12362 :665-681
[10]   ICNet: Information Conversion Network for RGB-D Based Salient Object Detection [J].
Li, Gongyang ;
Liu, Zhi ;
Ling, Haibin .
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2020, 29 :4873-4884