A Multi-Task Vision Transformer for Segmentation and Monocular Depth Estimation for Autonomous Vehicles

被引:12
作者
Bavirisetti, Durga Prasad [1 ]
Martinsen, Herman Ryen [2 ]
Kiss, Gabriel Hanssen [1 ]
Lindseth, Frank [1 ]
机构
[1] Norwegian Univ Sci & Technol, Dept Comp Sci, N-7034 Trondheim, Norway
[2] Capgemin, N-1671 Fredrikstad, Norway
来源
IEEE OPEN JOURNAL OF INTELLIGENT TRANSPORTATION SYSTEMS | 2023年 / 4卷
关键词
Vision transformer; monocular depth prediction; autonomous vehicles; segmentation; multi-task;
D O I
10.1109/OJITS.2023.3335648
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we investigate the use of Vision Transformers for processing and understanding visual data in an autonomous driving setting. Specifically, we explore the use of Vision Transformers for semantic segmentation and monocular depth estimation using only a single image as input. We present state-of-the-art Vision Transformers for these tasks and combine them into a multitask model. Through multiple experiments on four different street image datasets, we demonstrate that the multitask approach significantly reduces inference time while maintaining high accuracy for both tasks. Additionally, we show that changing the size of the Transformer-based backbone can be used as a trade-off between inference speed and accuracy. Furthermore, we investigate the use of synthetic data for pre-training and show that it effectively increases the accuracy of the model when real-world data is limited.
引用
收藏
页码:909 / 928
页数:20
相关论文
共 44 条
[1]   Autonomous Vehicles on the Edge: A Survey on Autonomous Vehicle Racing [J].
Betz, Johannes ;
Zheng, Hongrui ;
Liniger, Alexander ;
Rosolia, Ugo ;
Karle, Phillip ;
Behl, Madhur ;
Krovi, Venkat ;
Mangharam, Rahul .
IEEE OPEN JOURNAL OF INTELLIGENT TRANSPORTATION SYSTEMS, 2022, 3 :458-488
[2]   Albumentations: Fast and Flexible Image Augmentations [J].
Buslaev, Alexander ;
Iglovikov, Vladimir I. ;
Khvedchenya, Eugene ;
Parinov, Alex ;
Druzhinin, Mikhail ;
Kalinin, Alexandr A. .
INFORMATION, 2020, 11 (02)
[3]   A Review of Hydraulic Fracturing Simulation [J].
Chen, Bin ;
Barboza, Beatriz Ramos ;
Sun, Yanan ;
Bai, Jie ;
Thomas, Hywel R. ;
Dutko, Martin ;
Cottrell, Mark ;
Li, Chenfeng .
ARCHIVES OF COMPUTATIONAL METHODS IN ENGINEERING, 2022, 29 (04) :2113-2170
[4]   Masked-attention Mask Transformer for Universal Image Segmentation [J].
Cheng, Bowen ;
Misra, Ishan ;
Schwing, Alexander G. ;
Kirillov, Alexander ;
Girdhar, Rohit .
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, :1280-1289
[5]  
Cordts M, 2016, Arxiv, DOI [arXiv:1604.01685, DOI 10.48550/ARXIV.1604.01685]
[6]  
Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848
[7]  
Dosovitskiy Alexey, 2020, INT C LEARNING REPRE, V2, P6
[8]  
Eigen D., 2014, arXiv
[9]   Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-Scale Convolutional Architecture [J].
Eigen, David ;
Fergus, Rob .
2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :2650-2658
[10]  
Bhat SF, 2020, Arxiv, DOI arXiv:2011.14141