TFEdet: Efficient Multi-Frame 3D Object Detector via Proposal-Centric Temporal Feature Extraction

被引:0
作者
Kim, Jongho [1 ]
Sagong, Sungpyo [1 ]
Yi, Kyongsu [1 ]
机构
[1] Seoul Natl Univ, Dept Mech Engn, Seoul 08826, South Korea
关键词
Proposals; Feature extraction; Point cloud compression; Detectors; Three-dimensional displays; Autonomous vehicles; Transformers; Convolution; Object detection; Laser radar; 3D object detection; multi-frame detection; autonomous driving; LiDAR point cloud; gated recurrent unit;
D O I
10.1109/ACCESS.2024.3482093
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper proposes the Temporal Feature Extraction Detector (TFEdet), a novel deep learning-based 3D multi-frame object detector efficiently utilizing temporal features from consecutive point clouds. To leverage previously processed frames, inter-frame bipartite matching is performed between current detections from a pre-trained single-frame detector and predicted prior detections, while considering the ego-motion. Subsequently, based on inter-frame association, two types of proposed temporal features are accumulated: temporal proposal features, which are aggregated single-frame features of proposals, and inter-frame proposal features, which containing explicit information between frames. These collected temporal features are then temporally encoded in a Gated Recurrent Unit (GRU)-based temporal feature extraction head and added as residuals to the current frame proposals, leading to the final detection. In performance evaluations on the nuScenes dataset, the proposed TFEdet, which processes a relatively smaller number of point clouds, handles more than twice the frames per second compared to previous multi-frame detectors and still demonstrates competitive detection performance through effective utilization of temporal proposal features.
引用
收藏
页码:154526 / 154534
页数:9
相关论文
共 35 条
[1]   TransFusion: Robust LiDAR-Camera Fusion for 3D Object Detection with Transformers [J].
Bai, Xuyang ;
Hu, Zeyu ;
Zhu, Xinge ;
Huang, Qingqiu ;
Chen, Yilun ;
Fu, Hangbo ;
Tai, Chiew-Lan .
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, :1080-1089
[2]   nuScenes: A multimodal dataset for autonomous driving [J].
Caesar, Holger ;
Bankiti, Varun ;
Lang, Alex H. ;
Vora, Sourabh ;
Liong, Venice Erin ;
Xu, Qiang ;
Krishnan, Anush ;
Pan, Yu ;
Baldan, Giancarlo ;
Beijbom, Oscar .
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, :11618-11628
[3]  
Chen Q., 2020, Adv. Neural Inf. Process. Syst., V33, P21224
[4]  
Chen YK, 2023, Arxiv, DOI arXiv:2303.11301
[5]   Focal Sparse Convolutional Networks for 3D Object Detection [J].
Chen, Yukang ;
Li, Yanwei ;
Zhang, Xiangyu ;
Sun, Jian ;
Jia, Jiaya .
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, :5418-5427
[6]  
Chung JY, 2014, Arxiv, DOI [arXiv:1412.3555, DOI 10.48550/ARXIV.1412.3555]
[7]   VISTA: Boosting 3D Object Detection via Dual Cross-VIew SpaTial Attention [J].
Deng, Shengheng ;
Liang, Zhihao ;
Sun, Lin ;
Jia, Kui .
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, :8438-8447
[8]  
Ge R., 2020, arXiv, DOI DOI 10.48550/ARXIV.2006.12671
[9]   Vision meets robotics: The KITTI dataset [J].
Geiger, A. ;
Lenz, P. ;
Stiller, C. ;
Urtasun, R. .
INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH, 2013, 32 (11) :1231-1237
[10]  
Graham B, 2017, Arxiv, DOI [arXiv:1706.01307, DOI 10.48550/ARXIV.1706.01307]