Joint pyramid attention network for real-time semantic segmentation of urban scenes

被引:0
作者
Xuegang Hu
Liyuan Jing
Uroosa Sehar
机构
[1] Chongqing University of Posts and Telecommunications,Key Lab of Intelligent Analysis and Decision on Complex Systems
[2] Chongqing University of Posts and Telecommunications,Multimedia Communications Research Laboratory
[3] Northeastern University,Cross Media Artificial Intelligence Laboratory
来源
Applied Intelligence | 2022年 / 52卷
关键词
Attention mechanism; Encoder-decoder network; Feature pyramid module; Lightweight network; Real-time semantic segmentation;
D O I
暂无
中图分类号
学科分类号
摘要
Semantic segmentation is an advanced research topic in computer vision and can be regarded as a fundamental technique for image understanding and analysis. However, most of the current semantic segmentation networks only focus on segmentation accuracy while ignoring the requirements for high processing speed and low computational complexity in mobile terminal fields such as autonomous driving systems, drone applications, and fingerprint recognition systems. Aiming at the problems that the current semantic segmentation task are facing, it is difficult to meet the actual industrial needs due to its high computational cost. We propose a joint pyramid attention network (JPANet) for real-time semantic segmentation. First, we propose a joint feature pyramid (JFP) module, which can combine multiple network stages with learning multi-scale feature representations with strong semantic information, hence improving pixel classification performance. Second, we built a spatial detail extraction (SDE) module to capture the shallow network multi-level local features and make up for the geometric information lost in the down-sampling stage. Finally, we design a bilateral feature fusion (BFF) module, which properly integrates spatial information and semantic information through a hybrid attention mechanism in spatial dimensions and channel dimensions, making full use of the correspondence between high-level features and low-level features. We conducted a series of experiments on two challenging urban road scene datasets (Cityscapes and CamVid) and achieved excellent results. Among them, the experimental results on the Cityscapes dataset show that for 512 × 1024 high-resolution images, our method achieves 71.62% Mean Intersection over Union (mIoU) with 109.9 frames per second (FPS) on a single 1080Ti GPU.
引用
收藏
页码:580 / 594
页数:14
相关论文
共 81 条
  • [1] Hu X(2020)LDPNEt: A lightweight densely connected pyramid network for real-time semantic segmentation IEEE Access 8 212647-212658
  • [2] Jing L(2020)MFENEt: Multi-level feature enhancement network for real-time semantic segmentation Neurocomputing 393 54-65
  • [3] Zhang B(2020)Real-time semantic segmentation with fast attention IEEE Robot Autom Lett 6 263-270
  • [4] Li W(2016)Face model compression by distilling knowledge from neurons Proc AAAI Conf Artif Intell (AAAI) 30 3560-3566
  • [5] Hui Y(2014)Exploiting linear structure within convolutional networks for efficient evaluation Adv Neural Inform Process Syst 27 1269-1277
  • [6] Liu J(2018)Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs IEEE Trans Pattern Anal Mach Intell 40 834-C848
  • [7] Guan Y(2017)Segnet: a deep convolutional encoder-decoder architecture for image segmentation IEEE Trans Pattern Anal Mach Intell 39 2481-2495
  • [8] Hu P(2020)Small object augmentation of urban scenes for Real-Time semantic segmentation IEEE Trans Image Process 29 5175-5190
  • [9] Perazzi F(2020)Efficient fast semantic segmentation using continuous shuffle dilated convolutions IEEE Access 8 70913-70924
  • [10] Heilbron FC(2020)ADSCNEt: Asymmetric depthwise separable convolution for semantic segmentation in real-time Appl Intell 50 1045-1056