BEVSegFormer: Bird's Eye View Semantic Segmentation From Arbitrary Camera Rigs

被引：52

作者：

Peng, Lang ^{[1
]}

Chen, Zhirong ^{[1
]}

Fu, Zhangjie ^{[1
]}

Liang, Pengpeng ^{[2
]}

Cheng, Erkang ^{[1
]}

机构：

[1] Nullmax, Beijing, Peoples R China

[2] Zhengzhou Univ, Zhengzhou, Peoples R China

来源：

2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV) | 2023年

关键词：

D O I：

10.1109/WACV56688.2023.00588

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Semantic segmentation in bird's eye view (BEV) is an important task for autonomous driving. Though this task has attracted a large amount of research efforts, it is still challenging to flexibly cope with arbitrary (single or multiple) camera sensors equipped on the autonomous vehicle. In this paper, we present BEVSegFormer, an effective transformer-based method for BEV semantic segmentation from arbitrary camera rigs. Specifically, our method first encodes image features from arbitrary cameras with a shared backbone. These image features are then enhanced by a deformable transformer-based encoder. Moreover, we introduce a BEV transformer decoder module to parse BEV semantic segmentation results. An efficient multi-camera deformable attention unit is designed to carry out the BEV-to-image view transformation. Finally, the queries are reshaped according to the layout of grids in the BEV, and upsampled to produce the semantic segmentation result in a supervised manner. We evaluate the proposed algorithm on the public nuScenes dataset and a self-collected dataset. Experimental results show that our method achieves promising performance on BEV semantic segmentation from arbitrary camera rigs. We also demonstrate the effectiveness of each component via ablation study.

引用

页码：5924 / 5932

页数：9

共 36 条

[1] nuScenes: A multimodal dataset for autonomous driving [J].

Caesar, Holger ;

Bankiti, Varun ;

Lang, Alex H. ;

Vora, Sourabh ;

Liong, Venice Erin ;

Xu, Qiang ;

Krishnan, Anush ;

Pan, Yu ;

Baldan, Giancarlo ;

Beijbom, Oscar .

2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, :11618-11628

[2] Structured Bird's-Eye-View Traffic Scene Understanding from Onboard Images [J].

Can, Yigit Baran ;

Liniger, Alexander ;

Paudel, Danda Pani ;

Van Gool, Luc .

2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :15641-15650

[3] End-to-End Object Detection with Transformers [J].

Carion, Nicolas ;

Massa, Francisco ;

Synnaeve, Gabriel ;

Usunier, Nicolas ;

Kirillov, Alexander ;

Zagoruyko, Sergey .

COMPUTER VISION - ECCV 2020, PT I, 2020, 12346 :213-229

[4] DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs [J].

Chen, Liang-Chieh ;

Papandreou, George ;

Kokkinos, Iasonas ;

Murphy, Kevin ;

Yuille, Alan L. .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2018, 40 (04) :834-848

[5]

Cheng Bowen, 2021, ADV NEUR IN, V34

[6] NEAT: Neural Attention Fields for End-to-End Autonomous Driving [J].

Chitta, Kashyap ;

Prakash, Aditya ;

Geiger, Andreas .

2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :15773-15783

[7]

Dosovitskiy A., 2020, ICLR 2021

[8]

Gonzalez J. E., 2020, ARXIV200611436

[9]

Gupta W., 2022, P IEEE CVF WINT C AP, P523

[10] Short-term forecasting of origin-destination matrix in transit system via a deep learning approach [J].

He, Yuxin ;

Zhao, Yang ;

Tsui, Kwok-Leung .

TRANSPORTMETRICA A-TRANSPORT SCIENCE, 2023, 19 (02)

← 1 2 3 4 →