Atrous convolutional feature network for weakly supervised semantic segmentation

被引:0
作者
Xu L. [1 ]
Xue H. [1 ]
Bennamoun M. [1 ]
Boussaid F. [2 ]
Sohel F. [3 ]
机构
[1] Department of Computer Science and Software Engineering, The University of Western Australia, 35 Stirling Hwy, Perth, 6009, WA
[2] School of Electrical, Electronics and Computer Engineering, The University of Western Australia, 35 Stirling Hwy, Perth, 6009, WA
[3] Discipline of Information Technology, Mathematics & Statistics, Murdoch University, Murdoch, 6150, WA
基金
澳大利亚研究理事会;
关键词
Atrous convolution; Attention mechanism; Multi-label image classification; Multi-scale features; Weakly supervised semantic segmentation;
D O I
10.1016/j.neucom.2020.09.045
中图分类号
学科分类号
摘要
Weakly supervised semantic segmentation has been attracting increasing attention as it can alleviate the need for expensive pixel-level annotations through the use of image-level labels. Relevant methods mainly rely on the implicit object localization ability of convolutional neural networks (CNNs). However, generated object attention maps remain mostly small and incomplete. In this paper, we propose an Atrous Convolutional Feature Network (ACFN) to generate dense object attention maps. This is achieved by enhancing the context representation of image classification CNNs. More specifically, cascaded atrous convolutions are used in the middle layers to retain sufficient spatial details, and pyramidal atrous convolutions are used in the last convolutional layers to provide multi-scale context information for the extraction of object attention maps. Moreover, we propose an attentive fusion strategy to adaptively fuse the multi-scale features. Our method shows improvements over existing methods on both the PASCAL VOC 2012 and MS COCO datasets, achieving state-of-the-art performance. © 2020 Elsevier B.V.
引用
收藏
页码:115 / 126
页数:11
相关论文
共 58 条
[1]  
Khan S., Rahmani H., Shah S.A.A., Bennamoun M., A guide to convolutional neural networks for computer vision, Synthesis Lectures on Computer Vision, 8, 1, pp. 1-207, (2018)
[2]  
Xu L., Bennamoun M., An S., Sohel F., Boussaid F., Classification of corals in reflectance and fluorescence images using convolutional neural network representations, ICASSP, pp. 1493-1497, (2018)
[3]  
Xu L., Bennamoun M., Boussaid F., An S., Sohel F., Coral classification using densenet and cross-modality transfer learning, IJCNN, pp. 1-8, (2019)
[4]  
Deng J., Dong W., Socher R., Li L.-J., Li K., Fei-Fei L., Imagenet: A large-scale hierarchical image database, pp. 248-255, (2009)
[5]  
Everingham M., Van Gool L., Williams C.K., Winn J., Zisserman A., The pascal visual object classes (voc) challenge, IJCV, 88, 2, pp. 303-338, (2010)
[6]  
Lin T.-Y., Maire M., Belongie S., Hays J., Perona P., Ramanan D., Dollar P., Zitnick C.L., Microsoft coco: Common objects in context, pp. 740-755, (2014)
[7]  
Dai J., He K., Sun J., Boxsup: Exploiting bounding boxes to supervise convolutional networks for semantic segmentation, pp. 1635-1643, (2015)
[8]  
Khoreva A., Benenson R., Hosang J.H., Hein M., Schiele B., 1, (2017)
[9]  
Lin D., Dai J., Jia J., He K., Sun J., Scribblesup: Scribble-supervised convolutional networks for semantic segmentation, pp. 3159-3167, (2016)
[10]  
Bearman A., Russakovsky O., Ferrari V., Fei-Fei L., pp. 549-565, (2016)