Attention guided contextual feature fusion network for salient object detection

被引:33
作者
Zhang, Jin [1 ]
Shi, Yanjiao [1 ]
Zhang, Qing [1 ]
Cui, Liu [1 ]
Chen, Ying [1 ]
Yi, Yugen [2 ]
机构
[1] Shanghai Inst Technol, Sch Comp Sci & Informat Engn, Shanghai 201418, Peoples R China
[2] Jiangxi Normal Univ, Sch Software, Nanchang 330022, Jiangxi, Peoples R China
基金
中国国家自然科学基金; 上海市自然科学基金;
关键词
Salient object detection; Fully convolutional neural network; Attention mechanism; Feature fusion; CONVOLUTIONAL NEURAL-NETWORK; MODEL;
D O I
10.1016/j.imavis.2021.104337
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In recent years, the Convolutional Neural Network (CNN) has been widely used in various visual tasks because of its powerful feature extraction ability. Salient object detection methods based on CNN have also achieved great performance. Although a large number of feature information can be obtained through CNN, the key to improve the quality of the saliency maps is how to make full use of the high and low-level features and their relationships. Some previous works merged high and low-level features without processing the features, which resulted in the blurring of the saliency map, and even the inability to distinguish the foreground from the background in a complex environment. In order to solve the above problem, we propose an Attention guided Contextual Feature Fusion Network (ACFFNet) for salient object detection. There are mainly three modules in the proposed ACFFNet, including the Multi -field Channel Attention (MCA) module, Contextual Feature Fusion (CFF) module, and the feature Self-Refinement (SR) module. The MCA module selects features from different receptive fields, the CFF module can efficiently aggregate contextual features, and the SR module is able to repair the holes in the prediction maps caused by the contradictory response of different layers. In addition, we propose a Cross-Consistency Enhancement (CCE) loss to guide the network to focus on more detailed information and highlight the difference between foreground and background. Experimental results on six benchmark datasets show that the proposed method outperforms the state-of-the-art methods. (c) 2021 Elsevier B.V. All rights reserved.
引用
收藏
页数:14
相关论文
共 76 条
[1]  
Achanta R, 2009, PROC CVPR IEEE, P1597, DOI 10.1109/CVPRW.2009.5206596
[2]  
[Anonymous], 2015, 3 INT C LEARN REPR
[3]  
Borji A, 2012, PROC CVPR IEEE, P478, DOI 10.1109/CVPR.2012.6247711
[4]   Reverse Attention for Salient Object Detection [J].
Chen, Shuhan ;
Tan, Xiuli ;
Wang, Ben ;
Hu, Xuelong .
COMPUTER VISION - ECCV 2018, PT IX, 2018, 11213 :236-252
[5]  
Chen ZY, 2020, AAAI CONF ARTIF INTE, V34, P10599
[6]   Global Contrast based Salient Region Detection [J].
Cheng, Ming-Ming ;
Zhang, Guo-Xin ;
Mitra, Niloy J. ;
Huang, Xiaolei ;
Hu, Shi-Min .
2011 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2011, :409-416
[7]   Multi-Context Attention for Human Pose Estimation [J].
Chu, Xiao ;
Yang, Wei ;
Ouyang, Wanli ;
Ma, Cheng ;
Yuille, Alan L. ;
Wang, Xiaogang .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :5669-5678
[8]   Multi-Scale Semantic Segmentation and Spatial Relationship Recognition of Remote Sensing Images Based on an Attention Model [J].
Cui, Wei ;
Wang, Fei ;
He, Xin ;
Zhang, Dongyou ;
Xu, Xuxiang ;
Yao, Meng ;
Wang, Ziwei ;
Huang, Jiejun .
REMOTE SENSING, 2019, 11 (09)
[9]  
Deng ZJ, 2018, PROCEEDINGS OF THE TWENTY-SEVENTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, P684
[10]   Structure-measure: A New Way to Evaluate Foreground Maps [J].
Fan, Deng-Ping ;
Cheng, Ming-Ming ;
Liu, Yun ;
Li, Tao ;
Borji, Ali .
2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :4558-4567