TransMRE: Multiple Observation Planes Representation Encoding With Fully Sparse Voxel Transformers for 3-D Object Detection

被引：0

作者：

Zhu, Ziming ^{[1
]}

Zhu, Yu ^{[1
]}

Zhang, Kezhi ^{[1
]}

Li, Hangyu ^{[1
]}

Ling, Xiaofeng ^{[1
,2
]}

机构：

[1] East China Univ Sci & Technol, Sch Informat Sci & Engn, Shanghai 200237, Peoples R China

[2] East China Univ Sci & Technol, Shanghai Key Lab Intelligent Sensing & Detect Tech, Shanghai 200237, Peoples R China

来源：

IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT | 2024年 / 73卷

基金：

中国国家自然科学基金;

关键词：

Three-dimensional displays; Feature extraction; Point cloud compression; Object detection; Transformers; Encoding; Accuracy; Laser radar; Autonomous vehicles; Vectors; 3-D object detection; autonomous driving; deep learning; LiDAR; multiple observation planes representation encoding; point cloud; voxel feature factorizing;

D O I：

10.1109/TIM.2024.3480206

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

The effective representation and feature extraction of 3-D scenes from sparse and unstructured point clouds pose a significant challenge in 3-D object detection. In this article, we propose TransMRE, a network that enables fully sparse multiple observation plane feature fusion using LiDAR point clouds as single-modal input. TransMRE achieves this by sparsely factorizing a 3-D voxel scene into three separate observation planes: XY, XZ, and YZ planes. In addition, we propose Observation Plane Sparse Fusion and Interaction to explore the internal relationship between different observation planes. The Transformer mechanism is employed to realize feature attention within a single observation plane and feature attention across multiple observation planes. This recursive application of attention is done during multiple observation plane projection feature aggregation to effectively model the entire 3-D scene. This approach addresses the limitation of insufficient feature representation ability under a single bird's-eye view (BEV) constructed by extremely sparse point clouds. Furthermore, TransMRE maintains the full sparsity property of the entire network, eliminating the need to convert sparse feature maps into dense feature maps. As a result, it can be effectively applied to LiDAR point cloud data with large scanning ranges, such as Argoverse 2, while ensuring low computational complexity. Extensive experiments were conducted to evaluate the effectiveness of TransMRE, achieving 64.9 mAP and 70.4 NDS on the nuScenes detection benchmark, and 32.3 mAP on the Argoverse 2 detection benchmark. These results demonstrate that our method outperforms state-of-the-art methods.

引用

页数：13

共 44 条

[1] SP-Det: Leveraging Saliency Prediction for Voxel-Based 3D Object Detection in Sparse Point Cloud [J].

An, Pei ;

Duan, Yucong ;

Huang, Yuliang ;

Ma, Jie ;

Chen, Yanfei ;

Wang, Liheng ;

Yang, You ;

Liu, Qiong .

IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 (2795-2808) :2795-2808

[2] TransFusion: Robust LiDAR-Camera Fusion for 3D Object Detection with Transformers [J].

Bai, Xuyang ;

Hu, Zeyu ;

Zhu, Xinge ;

Huang, Qingqiu ;

Chen, Yilun ;

Fu, Hangbo ;

Tai, Chiew-Lan .

2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, :1080-1089

[3]

Caesar H, 2020, PROC CVPR IEEE, P11618, DOI 10.1109/CVPR42600.2020.01164

[4] TensoRF: Tensorial Radiance Fields [J].

Chen, Anpei ;

Xu, Zexiang ;

Geiger, Andreas ;

Yu, Jingyi ;

Su, Hao .

COMPUTER VISION - ECCV 2022, PT XXXII, 2022, 13692 :333-350

[5]

Chen Q., 2020, P ADV NEURAL INFORM, V33, P21224

[6] VoxelNeXt: Fully Sparse VoxelNet for 3D Object Detection and Tracking [J].

Chen, Yukang ;

Liu, Jianhui ;

Zhang, Xiangyu ;

Qi, Xiaojuan ;

Jia, Jiaya .

2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, :21674-21683

[7] Focal Sparse Convolutional Networks for 3D Object Detection [J].

Chen, Yukang ;

Li, Yanwei ;

Zhang, Xiangyu ;

Sun, Jian ;

Jia, Jiaya .

2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, :5418-5427

[8]

Deng JJ, 2021, AAAI CONF ARTIF INTE, V35, P1201

[9] VISTA: Boosting 3D Object Detection via Dual Cross-VIew SpaTial Attention [J].

Deng, Shengheng ;

Liang, Zhihao ;

Sun, Lin ;

Jia, Kui .

2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, :8438-8447

[10]

Fan L, 2022, ADV NEUR IN

← 1 2 3 4 5 →