DFA-SAT: Dynamic Feature Abstraction with Self-Attention-Based 3D Object Detection for Autonomous Driving

被引：9

作者：

Mushtaq, Husnain ^{[1
]}

Deng, Xiaoheng ^{[1
]}

Ali, Mubashir ^{[2
]}

Hayat, Babur ^{[3
]}

Raza Sherazi, Hafiz Husnain ^{[4
]}

机构：

[1] Cent South Univ, Sch Comp Sci & Engn, Changsha 410083, Peoples R China

[2] Univ Birmingham, Sch Comp Sci, Birmingham B15 2TT, England

[3] Univ Chenab, Dept Comp Sci, Gujrat 50700, Pakistan

[4] Univ West London, Sch Comp & Engn, London W5 5RF, England

来源：

SUSTAINABILITY | 2023年 / 15卷 / 18期

基金：

中国国家自然科学基金;

关键词：

smart cities; 3D object dejection; semantic features leaning; self-attention; VEHICLES;

D O I：

10.3390/su151813667

中图分类号：

X [环境科学、安全科学];

学科分类号：

08 ; 0830 ;

摘要：

Autonomous vehicles (AVs) play a crucial role in enhancing urban mobility within the context of a smarter and more connected urban environment. Three-dimensional object detection in AVs is an essential task for comprehending the driving environment to contribute to their safe use in urban environments. Existing 3D LiDAR object detection systems lose many critical point features during the down-sampling process and neglect the crucial interactions between local features, providing insufficient semantic information and leading to subpar detection performance. We propose a dynamic feature abstraction with self-attention (DFA-SAT), which utilizes self-attention to learn semantic features with contextual information by incorporating neighboring data and focusing on vital geometric details. DFA-SAT comprises four modules: object-based down-sampling (OBDS), semantic and contextual feature extraction (SCFE), multi-level feature re-weighting (MLFR), and local and global features aggregation (LGFA). The OBDS module preserves the maximum number of semantic foreground points along with their spatial information. SCFE learns rich semantic and contextual information with respect to spatial dependencies, refining the point features. MLFR decodes all the point features using a channel-wise multi-layered transformer approach. LGFA combines local features with decoding weights for global features using matrix product keys and query embeddings to learn spatial information across each channel. Extensive experiments using the KITTI dataset demonstrate significant improvements over the mainstream methods SECOND and PointPillars, improving the mean average precision (AP) by 6.86% and 6.43%, respectively, on the KITTI test dataset. DFA-SAT yields better and more stable performance for medium and long distances with a limited impact on real-time performance and model parameters, ensuring a transformative shift akin to when automobiles replaced conventional transportation in cities.

引用

页数：21

共 67 条

[1] Nearest cluster-based intrusion detection through convolutional neural networks [J].

Andresini, Giuseppina ;

Appice, Annalisa ;

Malerba, Donato .

KNOWLEDGE-BASED SYSTEMS, 2021, 216

[2] SA-Det3D: Self-Attention Based Context-Aware 3D Object Detection [J].

Bhattacharyya, Prarthana ;

Huang, Chengjie ;

Czarnecki, Krzysztof .

2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW 2021), 2021, :3022-3031

[3] The Development of the Smart Cities in the Connected and Autonomous Vehicles (CAVs) Era: From Mobility Patterns to Scaling in Cities [J].

Campisi, Tiziana ;

Severino, Alessandro ;

Al-Rashid, Muhammad Ahmad ;

Pau, Giovanni .

INFRASTRUCTURES, 2021, 6 (07)

[4] Sparse Activation Maps for Interpreting 3D Object Detection [J].

Chen, Qiuxiao ;

Li, Pengfei ;

Xu, Meng ;

Qi, Xiaojun .

2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2021, 2021, :76-84

[5] Multi-View 3D Object Detection Network for Autonomous Driving [J].

Chen, Xiaozhi ;

Ma, Huimin ;

Wan, Ji ;

Li, Bo ;

Xia, Tian .

30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :6526-6534

[6]

Chen XZ, 2015, ADV NEUR IN, V28

[7] Focal Sparse Convolutional Networks for 3D Object Detection [J].

Chen, Yukang ;

Li, Yanwei ;

Zhang, Xiangyu ;

Sun, Jian ;

Jia, Jiaya .

2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, :5418-5427

[8] A Review of 3D Object Detection for Autonomous Driving of Electric Vehicles [J].

Dai, Deyun ;

Chen, Zonghai ;

Bao, Peng ;

Wang, Jikai .

WORLD ELECTRIC VEHICLE JOURNAL, 2021, 12 (03)

[9]

Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171

[10] The Action Point Angle of Sight: A Traffic Generation Method for Driving Simulation, as a Small Step to Safe, Sustainable and Smart Cities [J].

Do, Minh Sang Pham ;

Kemanji, Ketoma Vix ;

Nguyen, Man Dinh Vinh ;

Vu, Tuan Anh ;

Meixner, Gerrit .

SUSTAINABILITY, 2023, 15 (12)

← 1 2 3 4 5 6 7 →