Stripe Pooling Attention for Real-Time Semantic Segmentation

被引:0
|
作者
Lyu J. [1 ,2 ]
Sun Y. [1 ,2 ]
Xu P. [1 ,2 ]
机构
[1] College of Computer and Information Science, Chongqing Normal University, Chongqing
[2] Chongqing Center of Engineering Technology Research on Digital Agriculture Service, Chongqing Normal University, Chongqing
关键词
attention mechanism; real scene; real-time semantic segmentation; strip pooling;
D O I
10.3724/SP.J.1089.2023.19608
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In order to solve the problem that it is difficult to meet the application in real scene because of the attention mechanism semantic segmentation algorithm cannot achieves a good balance between segmentation speed and accuracy. We proposed a lightweight real-time semantic segmentation algorithm based on strip-pooling attention. Firstly, lightweight backbone network was used to extract feature information, and a feature fusion module was constructed to obtain context information at different scales to improve the segmentation accuracy. Secondly the attention-based strip attention module (SAM) is used to improve the attentiveness of remote information, and horizontal strip pooling is added to SAM to reduce the computation of encoding global context. Experimental results show that the proposed algorithm can achieve high segmentation accuracy and meet the reai-time requirements, mIoU reached 70.6% on Cityscapes test set and the average segmentation speed is 92 frames per second; mIoU reached 66.4 on CamVid test set and the average segmentation speed is 196 frames per second. © 2023 Institute of Computing Technology. All rights reserved.
引用
收藏
页码:1395 / 1404
页数:9
相关论文
共 24 条
  • [1] Csurka G, Perronnin F., An efficient approach to semantic segmentation, International Journal of Computer Vision, 95, 2, pp. 198-212, (2011)
  • [2] Long J, Shelhamer E, Darrell T., Fully convolutional networks for semantic segmentation, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431-3440, (2015)
  • [3] Zhao H, Shi J, Qi X, Et al., Pyramid scene parsing network, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6230-6239, (2017)
  • [4] Chen L C, Zhu Y K, Papandreou G, Et al., Encoder-decoder with atrous separable convolution for semantic image segmentation, Proceedings of the European Conference on Computer Vision, pp. 833-851, (2018)
  • [5] Fu J, Liu J, Tian H J, Et al., Dual attention network for scene segmentation, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3141-3149, (2019)
  • [6] Huang Z L, Wang X G, Huang L C, Et al., CCNet: criss-cross attention for semantic segmentation, Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 603-612, (2019)
  • [7] Hou Q B, Zhang L, Cheng M M, Et al., Strip pooling: rethinking spatial pooling for scene parsing, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4002-4011, (2020)
  • [8] Huang Y J, Xu H X., Fully convolutional network with attention modules for semantic segmentation, Signal, Image and Video Processing, 15, 5, pp. 1031-1039, (2021)
  • [9] Yang X., An attention-fused network for semantic segmentation of very-high-resolution remote sensing imagery, ISPRS Journal of Photogrammetry and Remote Sensing, 177, pp. 238-262, (2021)
  • [10] Badrinarayanan V, Kendall A, Cipolla R., SegNet: a deep convolutional encoder-decoder architecture for image segmentation, IEEE Transactions on Pattern Analysis and Machine Intelligence, 39, 12, pp. 2481-2495, (2017)