Salient Object Detection With Dual-Branch Stepwise Feature Fusion and Edge Refinement

被引：9

作者：

Song, Xiaogang ^{[1
,2
]}

Guo, Fuqiang ^{[1
,2
]}

Zhang, Lei ^{[3
]}

Lu, Xiaofeng ^{[1
,2
]}

Hei, Xinhong ^{[1
,2
]}

机构：

[1] Xian Univ Technol, Sch Comp Sci & Engn, Xian 710048, Peoples R China

[2] Univ Shaanxi Prov, Engn Res Ctr Human Machine Integrat Intelligent Ro, Xian 710048, Peoples R China

[3] Aviat Ind Corp China AVIC, Xian Aeronaut Comp Tech Res Inst, Xian 710089, Peoples R China

来源：

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY | 2024年 / 34卷 / 04期

关键词：

Feature extraction; Transformers; Task analysis; Semantics; Image edge detection; Convolutional neural networks; Object detection; Salient object detection; dual-branch encoder; attention; gradual feature fusion; edge enhancement; NETWORK; ATTENTION; DRIVEN; MODEL;

D O I：

10.1109/TCSVT.2023.3312859

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

In recent years, Transformers have been gradually applied in salient object detection tasks with good results. However, the Transformer's global modeling capabilities can lead to the loss of local details that are important in salient object detection tasks. A feature extraction backbone based on a convolutional neural network (CNN) is good at extracting local detail features due to the gradual expansion of the receptive field but is limited by the size of the receptive field, resulting in an insufficient ability to extract global semantic features. Therefore, this paper combines the Transformer with a CNN and presents a dual-branch encoder to ensure that the features extracted contain rich global semantic information as well as local detail features. In addition, due to the different features extracted by the Transformer and CNN, noise may be introduced in the fusion of the two features, so different features need to be processed correspondingly during fusion. The fusion enhancement module (FEM) we propose fuses the features of the two branches step by step. A hybrid attention mechanism is used to carry out weighted fusion of different features. This progressive approach minimizes the differences between the features of the two branches so that the merged features retain the semantic and detail features extracted by the two branches to the greatest extent. Considering the loss of detailed information caused by repeated downsampling, we propose an edge refinement module (ERM) to address the need for accurate outline prediction. This module leverages salient features to obtain edge features and gradually refines the prediction results by incorporating these edge features. It makes full use of the connection between salient features and edge features and does not introduce additional edges to extract branches. Extensive experimental evaluations conducted on five benchmark tests demonstrate the superior performance of our method compared to other existing approaches. Code can be found at https://github.com/gfq1605694825/DSRNet-main.

引用

页码：2832 / 2844

页数：13

共 77 条

[1]

Achanta R, 2009, PROC CVPR IEEE, P1597, DOI 10.1109/CVPRW.2009.5206596

[2] End-to-End Object Detection with Transformers [J].

Carion, Nicolas ;

Massa, Francisco ;

Synnaeve, Gabriel ;

Usunier, Nicolas ;

Kirillov, Alexander ;

Zagoruyko, Sergey .

COMPUTER VISION - ECCV 2020, PT I, 2020, 12346 :213-229

[3]

Chen J., 2021, arXiv

[4] Reverse Attention for Salient Object Detection [J].

Chen, Shuhan ;

Tan, Xiuli ;

Wang, Ben ;

Hu, Xuelong .

COMPUTER VISION - ECCV 2018, PT IX, 2018, 11213 :236-252

[5] BINet: Bidirectional interactive network for salient object detection [J].

Chen, Tianyou ;

Hu, Xiaoguang ;

Xiao, Jin ;

Zhang, Guofeng ;

Wang, Shaojie .

NEUROCOMPUTING, 2021, 465 :490-502

[6] Control of goal-directed and stimulus-driven attention in the brain [J].

Corbetta, M ;

Shulman, GL .

NATURE REVIEWS NEUROSCIENCE, 2002, 3 (03) :201-215

[7]

Craye C, 2016, IEEE INT CONF ROBOT, P2303, DOI 10.1109/ICRA.2016.7487379

[8]

Dosovitskiy A, 2021, Arxiv, DOI arXiv:2010.11929

[9] Structure-measure: A New Way to Evaluate Foreground Maps [J].

Fan, Deng-Ping ;

Cheng, Ming-Ming ;

Liu, Yun ;

Li, Tao ;

Borji, Ali .

2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :4558-4567

[10] Res2Net: A New Multi-Scale Backbone Architecture [J].

Gao, Shang-Hua ;

Cheng, Ming-Ming ;

Zhao, Kai ;

Zhang, Xin-Yu ;

Yang, Ming-Hsuan ;

Torr, Philip .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2021, 43 (02) :652-662

← 1 2 3 4 5 6 7 8 →