STTracker: Spatio-Temporal Tracker for 3D Single Object Tracking

被引:7
作者
Cui, Yubo [1 ,2 ]
Li, Zhiheng [1 ,2 ]
Fang, Zheng [1 ,2 ,3 ]
机构
[1] Northeastern Univ, Fac Robot Sci & Engn, Shenyang 110819, Peoples R China
[2] Northeastern Univ, Natl Frontiers Sci Ctr Ind Intelligence & Syst Opt, Shenyang 110819, Peoples R China
[3] Northeastern Univ, Key Lab Data Analyt & Optimizat Smart Ind, Minist Educ, Shenyang 110819, Peoples R China
基金
中国国家自然科学基金;
关键词
deep learning; point cloud;
D O I
10.1109/LRA.2023.3290524
中图分类号
TP24 [机器人技术];
学科分类号
080202 ; 1405 ;
摘要
3D single object tracking with point clouds is a critical task in 3D computer vision. Previous methods usually input the last two frames and use the predicted box to get the template point cloud in previous frame and the search area point cloud in the current frame respectively, then use similarity-based or motion-based methods to predict the current box. Although these methods achieved good tracking performance, they ignore the historical information of the target, which is important for tracking. In this letter, compared to inputting two frames of point clouds, we input multi-frame of point clouds to encode the spatio-temporal information of the target and learn the motion information of the target implicitly, which could build the correlations among different frames to track the target in the current frame efficiently. Meanwhile, rather than directly using the point feature for feature fusion, we first crop the point cloud features into many patches and then use sparse attention mechanism to encode the patch-level similarity and finally fuse the multi-frame features. Extensive experiments show that our method achieves competitive results on challenging large-scale benchmarks (62.6% in KITTI and 49.66% in NuScenes). The code will be open soon.
引用
收藏
页码:4967 / 4974
页数:8
相关论文
共 39 条
[1]   nuScenes: A multimodal dataset for autonomous driving [J].
Caesar, Holger ;
Bankiti, Varun ;
Lang, Alex H. ;
Vora, Sourabh ;
Liong, Venice Erin ;
Xu, Qiang ;
Krishnan, Anush ;
Pan, Yu ;
Baldan, Giancarlo ;
Beijbom, Oscar .
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, :11618-11628
[2]  
Cui Y., 2021, P BRIT MACH VIS C VI, P317
[3]   Exploiting More Information in Sparse Point Cloud for 3D Single Object Tracking [J].
Cui, Yubo ;
Shan, Jiayao ;
Gu, Zuoxu ;
Li, Zhiheng ;
Fang, Zheng .
IEEE ROBOTICS AND AUTOMATION LETTERS, 2022, 7 (04) :11926-11933
[4]  
Dai J., 2021, PROC INT C LEARN REP, P1
[5]  
Dosovitskiy A., 2020, INT C LEARN REPR ICL
[6]   Accurate 3D Single Object Tracker With Local-to-Global Feature Refinement [J].
Fan, Baojie ;
Wang, Kai ;
Zhang, Hui ;
Tian, Jiandong .
IEEE ROBOTICS AND AUTOMATION LETTERS, 2022, 7 (04) :12211-12218
[7]  
Fan L, 2022, ADV NEUR IN
[8]  
Geiger A, 2012, PROC CVPR IEEE, P3354, DOI 10.1109/CVPR.2012.6248074
[9]   Leveraging Shape Completion for 3D Siamese Tracking [J].
Giancola, Silvio ;
Zarzar, Jesus ;
Ghanem, Bernard .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :1359-1368
[10]  
Han K, 2021, ADV NEUR IN