A Single-Stream Segmentation and Depth Prediction CNN for Autonomous Driving

被引：30

作者：

Aladem, Mohamed ^{[1
]}

Rawashdeh, Samir A. ^{[1
]}

机构：

[1] Univ Michigan Dearborn, Dearborn, MI 48128 USA

来源：

IEEE INTELLIGENT SYSTEMS | 2021年 / 36卷 / 04期

关键词：

Autonomous vehicles; Computer Vision for transportation; Deep learning in robotics and automation; Semantic scene understanding;

D O I：

10.1109/MIS.2020.2993266

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Convolutional neural networks (CNN) have been used successfully in solving many challenging visual perception tasks facing mobile robots and self-driving cars. To facilitate deploying such models on embedded hardware onboard mobile robots that have limited resources, multitask learning approaches have become common. In typically used multitask learning, a shared encoder network extracts features from inputs whereas multiple task-specific decoders transform these features into their target output. However, properly combining different tasks' losses into the final network loss such that each task is making progress learning is a major challenge in these approaches. In this article, we present an innovative approach to extend a typical single-task network with the capability of performing two tasks without multiple decoders, i.e., a single-stream two-task network. The two output tasks are semantic segmentation and monocular depth prediction which are essential tasks in visual perception for autonomous driving. The method is centered on solving semantic segmentation with a regression loss function rather than a classification one. With our approach, we seize multitask learning benefits of reduced overhead and enhanced generalization while alleviating the need to balance different loss functions. Experimental evaluations with baseline single tasks and a multitask network are presented.

引用

页码：79 / 85

页数：7

共 20 条

[1] Augmented Reality Meets Computer Vision: Efficient Data Generation for Urban Driving Scenes [J].

Abu Alhaija, Hassan ;

Mustikovela, Siva Karthik ;

Mescheder, Lars ;

Geiger, Andreas ;

Rother, Carsten .

INTERNATIONAL JOURNAL OF COMPUTER VISION, 2018, 126 (09) :961-972

[2]

Alhashim Ibraheem, 2018, High quality monocular depth estimation via transfer learning

[3] SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation [J].

Badrinarayanan, Vijay ;

Kendall, Alex ;

Cipolla, Roberto .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2017, 39 (12) :2481-2495

[4]

Bilen Hakan, 2017, ABS170107275 ARXIV

[5] Multitask learning [J].

Caruana, R .

MACHINE LEARNING, 1997, 28 (01) :41-75

[6] MultiNet plus plus : Multi-Stream Feature Aggregation and Geometric Loss Strategy for Multi-Task Learning [J].

Chennupati, Sumanth ;

Sistu, Ganesh ;

Yogamani, Senthil ;

Rawashdeh, Samir A. .

2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW 2019), 2019, :1200-1210

[7]

Cicek Ozgun, 2016, Medical Image Computing and Computer-Assisted Intervention - MICCAI 2016. 19th International Conference. Proceedings: LNCS 9901, P424, DOI 10.1007/978-3-319-46723-8_49

[8] The Cityscapes Dataset for Semantic Urban Scene Understanding [J].

Cordts, Marius ;

Omran, Mohamed ;

Ramos, Sebastian ;

Rehfeld, Timo ;

Enzweiler, Markus ;

Benenson, Rodrigo ;

Franke, Uwe ;

Roth, Stefan ;

Schiele, Bernt .

2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :3213-3223

[9]

Eigen D, 2014, ADV NEUR IN, V27

[10] Dynamic Task Prioritization for Multitask Learning [J].

Guo, Michelle ;

Haque, Albert ;

Huang, De-An ;

Yeung, Serena ;

Li Fei-Fei .

COMPUTER VISION - ECCV 2018, PT XVI, 2018, 11220 :282-299

← 1 2 →