MIDWRSEG: ACQUIRING ADAPTIVE MULTI-SCALE CONTEXTUAL INFORMATION FOR ROAD-SCENE SEMANTIC SEGMENTATION

被引:0
作者
Su, Bing [1 ]
Jin, Peng [1 ]
Lin, Yifeng [1 ]
Wang, Fuyang [1 ]
机构
[1] Changzhou Univ, Sch Comp Sci & Artif Intelligence, 2468 YanZeng West Rd, Changzhou, Jiangsu, Peoples R China
关键词
Deep convolutional network; attention mechanism; semantic segmen- tation; autonomous driving; NETWORK;
D O I
10.31577/cai_2024_4_849
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present MIDWRSeg, a simple semantic segmentation model based on neural network architecture. For complex road scenes, a large receptive field gathered at multiple scales is crucial for semantic segmentation tasks. Currently, there is an urgent need for the CNN architecture to establish long-range dependencies (large receptive fields) akin to the unique attention mechanism employed by the Transformer architecture. However, the high complexity of the attention mechanism formed by the matrix operations of Query, Key and Value cannot be borne by real-time semantic segmentation models. Therefore, a Multi-Scale Convolutional Attention (MSCA) block is constructed using inexpensive convolution operations to form long distance dependencies. In this method, the model adopts a Simple Inverted Residual (SIR) block for feature extraction in the initial encoding stage. After downsampling, the feature maps with reduced resolution undergo a sequence of stacked MSCA blocks, resulting in the formation of multi-scale long-range dependencies. Finally, in order to further enrich the size of the adaptive receptive field, an Internal Depth Wise Residual (IDWR) block is introduced. In the decoding stage, a simple decoder similar to FCN is used to alleviate computational consumption. Our method has formed a competitive advantage with existing real-time semantic segmentation models for encoder-decoder on Cityscapes and CamVid datasets. Our MIDWRSeg achieves 74.2 % mIoU at a speed of 88.9 FPS at Cityscapes test and achieves 76.8 % mIoU at a speed of 95.2 FPS at CamVid test.
引用
收藏
页码:849 / 873
页数:25
相关论文
共 48 条
  • [1] SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation
    Badrinarayanan, Vijay
    Kendall, Alex
    Cipolla, Roberto
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2017, 39 (12) : 2481 - 2495
  • [2] Large-Scale Machine Learning with Stochastic Gradient Descent
    Bottou, Leon
    [J]. COMPSTAT'2010: 19TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL STATISTICS, 2010, : 177 - 186
  • [3] Segmentation and Recognition Using Structure from Motion Point Clouds
    Brostow, Gabriel J.
    Shotton, Jamie
    Fauqueur, Julien
    Cipolla, Roberto
    [J]. COMPUTER VISION - ECCV 2008, PT I, PROCEEDINGS, 2008, 5302 : 44 - +
  • [4] Semantic object classes in video: A high-definition ground truth database
    Brostow, Gabriel J.
    Fauqueur, Julien
    Cipolla, Roberto
    [J]. PATTERN RECOGNITION LETTERS, 2009, 30 (02) : 88 - 97
  • [5] Chaurasia A, 2017, 2017 IEEE VISUAL COMMUNICATIONS AND IMAGE PROCESSING (VCIP)
  • [6] Chen L.C., 2014, ARXIV14127062
  • [7] Chen LC, 2017, Arxiv, DOI arXiv:1706.05587
  • [8] DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs
    Chen, Liang-Chieh
    Papandreou, George
    Kokkinos, Iasonas
    Murphy, Kevin
    Yuille, Alan L.
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2018, 40 (04) : 834 - 848
  • [9] The Cityscapes Dataset for Semantic Urban Scene Understanding
    Cordts, Marius
    Omran, Mohamed
    Ramos, Sebastian
    Rehfeld, Timo
    Enzweiler, Markus
    Benenson, Rodrigo
    Franke, Uwe
    Roth, Stefan
    Schiele, Bernt
    [J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 3213 - 3223
  • [10] Scaling Up Your Kernels to 31x31: Revisiting Large Kernel Design in CNNs
    Ding, Xiaohan
    Zhang, Xiangyu
    Han, Jungong
    Ding, Guiguang
    [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 11953 - 11965