Bird's-Eye View Semantic Segmentation for Autonomous Driving through the Large Kernel Attention Encoder and Bilinear-Attention Transform Module

被引：1

作者：

Li, Ke ^{[1
]}

Wu, Xuncheng ^{[1
]}

Zhang, Weiwei ^{[2
]}

Yu, Wangpengfei ^{[2
]}

机构：

[1] Shanghai Univ Engn Sci, Sch Mech & Automot Engn, Shanghai 201620, Peoples R China

[2] Shanghai Smart Vehicle Cooperating Innovat Ctr Co, Shanghai 201805, Peoples R China

来源：

WORLD ELECTRIC VEHICLE JOURNAL | 2023年 / 14卷 / 09期

关键词：

camera; bird's eye view; autonomous driving; view transformation; semantic segmentation; OPTICAL-FLOW;

D O I：

10.3390/wevj14090239

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Building an autonomous driving system requires a detailed and unified semantic representation from multiple cameras. The bird's eye view (BEV) has demonstrated remarkable potential as a comprehensive and unified perspective. However, most current research focuses on innovating the view transform module, ignoring whether the crucial image encoder can construct long-range feature relationships. Hence, we redesign an image encoder with a large kernel attention mechanism to encode image features. Considering the performance gains obtained by the complex view transform module are insignificant, we propose a simple and effective Bilinear-Attention Transform module to lift the dimension completely. Finally, we redesign a BEV encoder with a CNN block of a larger kernel size to reduce the distortion of BEV features away from the ego vehicle. The results on the nuScenes dataset confirm that our model outperforms other models with equivalent training settings on the segmentation task and approaches state-of-the-art performance.

引用

页数：14

共 34 条

[1]

[Anonymous], 2022, P C ROB LEARN, DOI [10.48550/arXiv.2110.06922, DOI 10.48550/ARXIV.2110.06922]

[2]

Can Yigit Baran, 2021, P IEEE CVF INT C COM, P15661

[3]

Carion N, 2020, Img Proc Comp Vis Re, V12346, P213, DOI 10.1007/978-3-030-58452-8_13

[4]

Chen SY, 2022, Arxiv, DOI arXiv:2206.04584

[5] Scaling Up Your Kernels to 31x31: Revisiting Large Kernel Design in CNNs [J].

Ding, Xiaohan ;

Zhang, Xiangyu ;

Han, Jungong ;

Ding, Guiguang .

2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, :11953-11965

[6]

Guo MH, 2022, Arxiv, DOI arXiv:2202.09741

[7] Simple-BEV: What Really Matters for Multi-Sensor BEV Perception? [J].

Harley, Adam W. ;

Fang, Zhaoyuan ;

Li, Jie ;

Ambrus, Rares ;

Fragkiadaki, Katerina .

2023 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION, ICRA, 2023, :2759-2765

[8] Deep Residual Learning for Image Recognition [J].

He, Kaiming ;

Zhang, Xiangyu ;

Ren, Shaoqing ;

Sun, Jian .

2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :770-778

[9]

Hendy N, 2020, Arxiv, DOI arXiv:2006.09917

[10] FIERY: Future Instance Prediction in Bird's-Eye View from Surround Monocular Cameras [J].

Hu, Anthony ;

Murez, Zak ;

Mohan, Nikhil ;

Dudas, Sofia ;

Hawke, Jeffrey ;

Badrinarayanan, Vijay ;

Cipolla, Roberto ;

Kendall, Alex .

2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :15253-15262

← 1 2 3 4 →