A Single-Stage 3D Object Detection Method Based on Sparse Attention Mechanism

被引：0

作者：

Jia, Songche ^{[1
]}

Zhang, Zhenyu ^{[1
,2
]}

机构：

[1] Xinjiang Univ, Sch Informat Sci & Engn, Urumqi 830017, Peoples R China

[2] Xinjiang Key Lab Multilingual Informat Technol, Urumqi 830017, Peoples R China

来源：

PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT III | 2024年 / 14427卷

关键词：

3D object detection; Feature extraction; Sparse attention mechanism;

D O I：

10.1007/978-981-99-8435-0_33

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The Bird's Eye View (BEV) feature extraction module is an important part of 3D object detection based on point cloud data. However, the existing methods ignore the correlation between objects, resulting in a large amount of irrelevant information participating in feature extraction, which makes the detection accuracy low. To solve this problem, this paper proposes a BEV feature extraction method named Dynamic Extraction Of Effective Features (DEF) and designs a singlestage 3D object detection model. This feature extraction method first uses convolution operations to extract local features. Then the weight of elements in the BEV feature map is redistributed by spatial attention, highlighting the position of critical elements in the feature map. Then, a sparse two-level routing attention mechanism is used globally to screen out top-k routing regions with the strongest correlation with the target region to avoid interference from irrelevant information. Finally, a token-to-token attention operation is applied to the joint top-k routing regions to extract effective features. The results on the benchmark KITTI dataset show that our method can effectively improve the detection accuracy of 3D objects.

引用

页码：414 / 425

页数：12

共 27 条

[1]

Chen C, 2022, AAAI CONF ARTIF INTE, P221

[2] Focal Sparse Convolutional Networks for 3D Object Detection [J].

Chen, Yukang ;

Li, Yanwei ;

Zhang, Xiangyu ;

Sun, Jian ;

Jia, Jiaya .

2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, :5418-5427

[3]

Deng JJ, 2021, AAAI CONF ARTIF INTE, V35, P1201

[4]

Dosovitskiy A, 2021, Arxiv, DOI [arXiv:2010.11929, 10.48550/arXiv.2010.11929, DOI 10.48550/ARXIV.2010.11929]

[5] Vision meets robotics: The KITTI dataset [J].

Geiger, A. ;

Lenz, P. ;

Stiller, C. ;

Urtasun, R. .

INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH, 2013, 32 (11) :1231-1237

[6] 3D Semantic Segmentation with Submanifold Sparse Convolutional Networks [J].

Graham, Benjamin ;

Engelcke, Martin ;

van der Maaten, Laurens .

2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :9224-9232

[7]

He QD, 2022, AAAI CONF ARTIF INTE, P870

[8] PointPillars: Fast Encoders for Object Detection from Point Clouds [J].

Lang, Alex H. ;

Vora, Sourabh ;

Caesar, Holger ;

Zhou, Lubing ;

Yang, Jiong ;

Beijbom, Oscar .

2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :12689-12697

[9] Voxel Field Fusion for 3D Object Detection [J].

Li, Yanwei ;

Qi, Xiaojuan ;

Chen, Yukang ;

Wang, Liwei ;

Li, Zeming ;

Sun, Jian ;

Jia, Jiaya .

2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, :1110-1119

[10]

Lu HY, 2015, PROC CVPR IEEE, P806, DOI 10.1109/CVPR.2015.7298681

← 1 2 3 →