TM2B: Transformer-Based Motion-to-Box Network for 3D Single Object Tracking on Point Clouds

被引:0
|
作者
Xu, Anqi [1 ]
Nie, Jiahao [1 ]
He, Zhiwei [1 ]
Lv, Xudong [1 ]
机构
[1] Sch Hangzhou Dianzi Univ, Hangzhou 310018, Peoples R China
来源
IEEE ROBOTICS AND AUTOMATION LETTERS | 2024年 / 9卷 / 08期
关键词
Transformers; Accuracy; Three-dimensional displays; Target tracking; Object tracking; Feature extraction; Point cloud compression; 3D single object tracking; motion-to-box; transformer;
D O I
10.1109/LRA.2024.3418274
中图分类号
TP24 [机器人技术];
学科分类号
080202 ; 1405 ;
摘要
3D single object tracking plays a crucial role in numerous applications such as autonomous driving. Recent trackers based on motion-centric paradigm perform well as they exploit motion cues to infer target relative motion across successive frames, which effectively overcome significant appearance variations of targets and distractors caused by occlusion. However, such a motion-centric paradigm tends to require multi-stage motion-to-box to refine the motion cues, which suffers from tedious hyper-parameter tuning and elaborate subtask designs. In this letter, we propose a novel transformer-based motion-to-box network (TM2B), which employs a learnable relation modeling transformer (LRMT) to generate accurate motion cues without multi-stage refinements. Our proposed LRMT contains two novel attention mechanisms: hierarchical interactive attention and learnable query attention. The former attention builds a learnable number-fixed sampling sets for each query on multi-scale feature maps, enabling each query to adaptively select prominent sampling elements, thus effectively encoding multi-scale features in a lightweight manner, while the latter calculates the weighted sum of the encoded features with learnable global query, enabling to extract valuable motion cues from all available features, thereby achieving accurate object tracking. Extensive experiments demonstrate that TM2B achieves state-of-the-art performance on KITTI, NuScenes and Waymo Open Dataset, while obtaining a significant improvement in inference speed over previous leading methods, achieving 56.8 FPS on a single NVIDIA 1080Ti GPU. The code is available at TM2B.
引用
收藏
页码:7078 / 7085
页数:8
相关论文
共 50 条
  • [41] PCMG:3D point cloud human motion generation based on self-attention and transformer
    Ma, Weizhao
    Yin, Mengxiao
    Li, Guiqing
    Yang, Feng
    Chang, Kan
    VISUAL COMPUTER, 2024, 40 (05): : 3765 - 3780
  • [42] PCMG:3D point cloud human motion generation based on self-attention and transformer
    Weizhao Ma
    Mengxiao Yin
    Guiqing Li
    Feng Yang
    Kan Chang
    The Visual Computer, 2024, 40 : 3765 - 3780
  • [43] VoxT-GNN: A 3D object detection approach from point cloud based on voxel-level transformer and graph neural network
    Zheng, Qiangwen
    Wu, Sheng
    Wei, Jinghui
    INFORMATION PROCESSING & MANAGEMENT, 2025, 62 (04)
  • [44] TH-NET: A METHOD OF SINGLE 3D OBJECT TRACKING BASED ON TRANSFORMERS AND HAUSDORFF DISTANCE
    Zhang, Zihao
    Sang, Nan
    Wang, Xupeng
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 2280 - 2284
  • [45] Object tracking method based on joint global and local feature descriptor of 3D LIDAR point cloud
    Qian, Qishu
    Hu, Yihua
    Zhao, Nanxiang
    Li, Minle
    Shao, Fucai
    Zhang, Xinyuan
    CHINESE OPTICS LETTERS, 2020, 18 (06)
  • [46] Object tracking method based on joint global and local feature descriptor of 3D LIDAR point cloud
    钱其姝
    胡以华
    赵楠翔
    李敏乐
    邵福才
    张鑫源
    Chinese Optics Letters, 2020, 18 (06) : 28 - 33
  • [47] Efficient Point-Based Single Scale 3D Object Detection from Traffic Scenes
    Tang, Wenneng
    Li, Yaochen
    Li, Yifan
    Dong, Bo
    PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT II, 2024, 14426 : 155 - 167
  • [48] 2DSlicesNet: A 2D Slice-Based Convolutional Neural Network for 3D Object Retrieval and Classification
    Taybi, Ilyass Ouazzani
    Gadi, Taoufiq
    Alaoui, Rachid
    IEEE ACCESS, 2021, 9 : 24041 - 24049
  • [49] Transformer-based 2D/3D medical image registration for X-ray to CT via anatomical features
    Qu, Feng
    Zhang, Min
    Shi, Weili
    He, Wei
    Jiang, Zhengang
    INTERNATIONAL JOURNAL OF MEDICAL ROBOTICS AND COMPUTER ASSISTED SURGERY, 2024, 20 (01):
  • [50] Realtime Single-Shot Refinement Neural Network With Adaptive Receptive Field for 3D Object Detection From LiDAR Point Cloud
    Wu, Yutian
    Zhang, Shuwei
    Ogai, Harutoshi
    Inujima, Hiroshi
    Tateno, Shigeyuki
    IEEE SENSORS JOURNAL, 2021, 21 (21) : 24505 - 24519