A Driving Area Detection Algorithm Based on Improved Swin Transformer

被引:0
作者
Liu, Shuang [1 ,2 ]
Li, Ying [1 ,2 ]
Sheng, Huankun [1 ,2 ]
机构
[1] Jilin Univ, Coll Comp Sci & Technol, Changchun 130012, Jilin, Peoples R China
[2] Jilin Univ, Minist Educ, Key Lab Symbol Computat & Knowledge Engn, Changchun 130012, Jilin, Peoples R China
关键词
CNNS; driving area detection; multiscale fusion; semantic segmentation; Swin Transformer; FREE-SPACE;
D O I
10.14569/IJACSA.2024.0150224
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Drivable area or free space detection is an essential part of the perception system of an autonomous vehicle. It helps intelligent vehicles understand road conditions and determine safe driving areas. Most of the driving area detection algorithms are based on semantic segmentation that classifies each pixel into its category, and recent advances in convolutional neural networks (CNNs) have significantly facilitated semantic segmentation in driving scenarios. Though promising results have been obtained, the existing CNN-based drivable area detection methods usually process one local neighborhood at a time. The locality of convolutional operation fails to capture long-range dependencies. To solve this problem, we propose an improved Swin Transformer based on shift window, named Multi-Swin. First, an improved patch merging strategy is proposed to enhance feature interactions between adjacent patches. Second, a decoder with upsampling layer is designed to restore the resolution of the feature map. Last, a multi-scale fusion module is utilized to improve the representation ability of global semantic and geometric information. Our method is evaluated and tested on the publicly available Cityscapes dataset. The experimental results show that our method achieves 91.92% IoU in road segmentation detection, surpassing state-of-the-art methods.
引用
收藏
页码:227 / 234
页数:8
相关论文
共 39 条
[1]   SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation [J].
Badrinarayanan, Vijay ;
Kendall, Alex ;
Cipolla, Roberto .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2017, 39 (12) :2481-2495
[2]  
Chen LC, 2016, Arxiv, DOI [arXiv:1412.7062, DOI 10.48550/ARXIV.1412.7062]
[3]  
Chen LC, 2017, Arxiv, DOI arXiv:1706.05587
[4]   Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation [J].
Chen, Liang-Chieh ;
Zhu, Yukun ;
Papandreou, George ;
Schroff, Florian ;
Adam, Hartwig .
COMPUTER VISION - ECCV 2018, PT VII, 2018, 11211 :833-851
[5]   ADFNet: accumulated decoder features for real-time semantic segmentation [J].
Choi, Hyunguk ;
Ahn, Hoyeon ;
Kim, Joonmo ;
Jeon, Moongu .
IET COMPUTER VISION, 2020, 14 (08) :555-563
[6]   Xception: Deep Learning with Depthwise Separable Convolutions [J].
Chollet, Francois .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :1800-1807
[7]   The Cityscapes Dataset for Semantic Urban Scene Understanding [J].
Cordts, Marius ;
Omran, Mohamed ;
Ramos, Sebastian ;
Rehfeld, Timo ;
Enzweiler, Markus ;
Benenson, Rodrigo ;
Franke, Uwe ;
Roth, Stefan ;
Schiele, Bernt .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :3213-3223
[8]   Image semantic segmentation method based on GAN network and ERFNet model [J].
Dong, Chaoxian .
JOURNAL OF ENGINEERING-JOE, 2021, 2021 (04) :189-200
[9]  
Dosovitskiy Alexey, 2020, arXiv, V15
[10]   Dual Attention Network for Scene Segmentation [J].
Fu, Jun ;
Liu, Jing ;
Tian, Haijie ;
Li, Yong ;
Bao, Yongjun ;
Fang, Zhiwei ;
Lu, Hanqing .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :3141-3149