TM2B: Transformer-Based Motion-to-Box Network for 3D Single Object Tracking on Point Clouds

被引:0
|
作者
Xu, Anqi [1 ]
Nie, Jiahao [1 ]
He, Zhiwei [1 ]
Lv, Xudong [1 ]
机构
[1] Sch Hangzhou Dianzi Univ, Hangzhou 310018, Peoples R China
来源
IEEE ROBOTICS AND AUTOMATION LETTERS | 2024年 / 9卷 / 08期
关键词
Transformers; Accuracy; Three-dimensional displays; Target tracking; Object tracking; Feature extraction; Point cloud compression; 3D single object tracking; motion-to-box; transformer;
D O I
10.1109/LRA.2024.3418274
中图分类号
TP24 [机器人技术];
学科分类号
080202 ; 1405 ;
摘要
3D single object tracking plays a crucial role in numerous applications such as autonomous driving. Recent trackers based on motion-centric paradigm perform well as they exploit motion cues to infer target relative motion across successive frames, which effectively overcome significant appearance variations of targets and distractors caused by occlusion. However, such a motion-centric paradigm tends to require multi-stage motion-to-box to refine the motion cues, which suffers from tedious hyper-parameter tuning and elaborate subtask designs. In this letter, we propose a novel transformer-based motion-to-box network (TM2B), which employs a learnable relation modeling transformer (LRMT) to generate accurate motion cues without multi-stage refinements. Our proposed LRMT contains two novel attention mechanisms: hierarchical interactive attention and learnable query attention. The former attention builds a learnable number-fixed sampling sets for each query on multi-scale feature maps, enabling each query to adaptively select prominent sampling elements, thus effectively encoding multi-scale features in a lightweight manner, while the latter calculates the weighted sum of the encoded features with learnable global query, enabling to extract valuable motion cues from all available features, thereby achieving accurate object tracking. Extensive experiments demonstrate that TM2B achieves state-of-the-art performance on KITTI, NuScenes and Waymo Open Dataset, while obtaining a significant improvement in inference speed over previous leading methods, achieving 56.8 FPS on a single NVIDIA 1080Ti GPU. The code is available at TM2B.
引用
收藏
页码:7078 / 7085
页数:8
相关论文
共 50 条
  • [31] Robust object tracking techniques for vision-based 3D motion analysis applications
    Knyaz, Vladimir A.
    Zheltov, Sergey Yu.
    Vishnyakov, Boris V.
    OPTICS, PHOTONICS AND DIGITAL TECHNOLOGIES FOR IMAGING APPLICATIONS IV, 2016, 9896
  • [32] MVPointNet: Multi-View Network for 3D Object Based on Point Cloud
    Zhou, Weiguo
    Jiang, Xin
    Liu, Yun-Hui
    IEEE SENSORS JOURNAL, 2019, 19 (24) : 12145 - 12152
  • [33] A Systematic Survey of Transformer-Based 3D Object Detection for Autonomous Driving: Methods, Challenges and Trends
    Zhu, Minling
    Gong, Yadong
    Tian, Chunwei
    Zhu, Zuyuan
    DRONES, 2024, 8 (08)
  • [34] KPTr: Key point transformer for LiDAR-based 3D object detection
    Cao, Jie
    Peng, Yiqiang
    Wei, Hongqian
    Mo, Lingfan
    Fan, Likang
    Wang, Longfei
    MEASUREMENT, 2025, 242
  • [35] P2V-RCNN: Point to Voxel Feature Learning for 3D Object Detection From Point Clouds
    Li, Jiale
    Sun, Yu
    Luo, Shujie
    Zhu, Ziqi
    Dai, Hang
    Krylov, Andrey S.
    Ding, Yong
    Shao, Ling
    IEEE ACCESS, 2021, 9 : 98249 - 98260
  • [36] Research on Object Panoramic 3D Point Cloud Reconstruction System Based on Structure From Motion
    Zhang, Xuejing
    Liu, Jingyan
    Zhang, Bo
    Sun, Lei
    Zhou, Yuhong
    Li, Yuchao
    Zhang, Jun
    Zhang, Hao
    Fan, Xiaofei
    IEEE ACCESS, 2022, 10 : 110064 - 110075
  • [37] Object tracking based on siamese network with 3D attention and multiple graph attention
    Yan, Shilei
    Qi, Yujuan
    Liu, Mengxue
    Wang, Yanjiang
    Liu, Baodi
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2023, 235
  • [38] A 2D/3D model-based object tracking framework
    Polat, E
    Yeasin, M
    Sharma, R
    PATTERN RECOGNITION, 2003, 36 (09) : 2127 - 2141
  • [39] VFL3D: A Single-Stage Fine-Grained Lightweight Point Cloud 3D Object Detection Algorithm Based on Voxels
    Li, Bing
    Chen, Jie
    Li, Xinde
    Xu, Rui
    Li, Qian
    Cao, Yice
    Wu, Jun
    Qu, Lei
    Li, Yingsong
    Diniz, Paulo S. R.
    IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2024, 25 (09) : 12034 - 12048
  • [40] Range-Aware Attention Network for LiDAR-Based 3D Object Detection With Auxiliary Point Density Level Estimation
    Lu, Yantao
    Hao, Xuetao
    Li, Yilan
    Chai, Weiheng
    Sun, Shiqi
    Velipasalar, Senem
    IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, 2025, 74 (01) : 292 - 305