A Road Segmentation Model Based on Mixture of the Convolutional Neural Network and the Transformer Network

被引：4

作者：

Xu, Fenglei ^{[1
]}

Zhao, Haokai ^{[1
]}

Hu, Fuyuan ^{[1
]}

Shen, Mingfei ^{[1
]}

Wu, Yifei ^{[1
]}

机构：

[1] Suzhou Univ Sci & Technol, Suzhou 215009, Peoples R China

来源：

CMES-COMPUTER MODELING IN ENGINEERING & SCIENCES | 2023年 / 135卷 / 02期

关键词：

Image segmentation; transformer; mix block; U-shaped structures;

D O I：

10.32604/cmes.2022.023217

中图分类号：

T [工业技术];

学科分类号：

08 ;

摘要：

Convolutional neural networks (CNN) based on U-shaped structures and skip connections play a pivotal role in various image segmentation tasks. Recently, Transformer starts to lead new trends in the image segmentation task. Transformer layer can construct the relationship between all pixels, and the two parties can complement each other well. On the basis of these characteristics, we try to combine Transformer pipeline and convolutional neural network pipeline to gain the advantages of both. The image is put into the U-shaped encoder-decoder architecture based on empirical combination of self-attention and convolution, in which skip connections are utilized for local-global semantic feature learning. At the same time, the image is also put into the convolutional neural network architecture. The final segmentation result will be formed by Mix block which combines both. The mixture model of the convolutional neural network and the Transformer network for road segmentation (MCTNet) can achieve effective segmentation results on KITTI dataset and Unstructured Road Scene (URS) dataset built by ourselves. Codes, self-built datasets and trainable models will be available on https://github.com/xflxfl1992/MCTNet.

引用

页码：1559 / 1570

页数：12

共 23 条

[1] SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation [J].

Badrinarayanan, Vijay ;

Kendall, Alex ;

Cipolla, Roberto .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2017, 39 (12) :2481-2495

[2] RoadNet-RT: High Throughput CNN Architecture and SoC Design for Real-Time Road Segmentation [J].

Bai, Lin ;

Lyu, Yecheng ;

Huang, Xinming .

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS, 2021, 68 (02) :704-714

[3]

Cao H., 2021, arXiv, DOI 10.48550/arXiv:2105.05537

[4]

Dosovitskiy A, 2021, Arxiv, DOI arXiv:2010.11929

[5]

Graham B, 2021, Arxiv, DOI arXiv:2104.01136

[6]

Li HS, 2014, Arxiv, DOI arXiv:1412.4526

[7] Swin Transformer: Hierarchical Vision Transformer using Shifted Windows [J].

Liu, Ze ;

Lin, Yutong ;

Cao, Yue ;

Hu, Han ;

Wei, Yixuan ;

Zhang, Zheng ;

Lin, Stephen ;

Guo, Baining .

2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :9992-10002

[8]

Long J, 2015, PROC CVPR IEEE, P3431, DOI 10.1109/CVPR.2015.7298965

[9] Learning Deconvolution Network for Semantic Segmentation [J].

Noh, Hyeonwoo ;

Hong, Seunghoon ;

Han, Bohyung .

2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :1520-1528

[10]

Papandreou G, 2015, Arxiv, DOI arXiv:1502.02734

← 1 2 3 →