Multitarget Real-Time Tracking Algorithm Based on Transformer and BYTE

被引：0

作者：

Pan Hao ^{[1
]}

Liu Xiang ^{[1
]}

Zhao Jingwen ^{[1
]}

Zhang Xing ^{[2
,3
]}

机构：

[1] Shanghai Univ Engn Sci, Sch Elect & Elect Engn, Shanghai 201620, Peoples R China

[2] Shanghai Univ Engn Sci, Sch Management, Shanghai 201620, Peoples R China

[3] Jiangsu Univ, Automot Engn Res Inst, Zhenjiang 212013, Jiangsu, Peoples R China

来源：

LASER & OPTOELECTRONICS PROGRESS | 2023年 / 60卷 / 06期

关键词：

multi; target tracking; YOLOX; BYTE; Transformer; complex scene;

D O I：

10.3788/LOP220514

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

To solve the problems of trajectory missed detection, misdetection, and identity switching in complex multitarget tracking, this paper proposes a multitarget tracking algorithm based on improved YOLOX and BYTE data association methods. First, to enhance YOLOX's target detection capabilities in complex environments, we combine the YOLOX backbone network and Vision Transformer to improve the network's local feature extraction capability and add the a - GIoU loss function to further improve the regression accuracy of the network bounding box. Second, to meet the real- time requirements of the algorithm, we employ the BYTE data association method, abandon the traditional feature rerecognition (Re-ID) network, and further improving the speed of the proposed multitarget tracking algorithm. Finally, to mitigate the tracking problems in complex environments, such as illumination and occlusion, we adopt the extended Kalman filter, which is more adaptive to the nonlinear system, to improve the prediction accuracy of the network for tracking trajectory in complex scenes. The experimental results show that the multiple object tracking accuracy (MOTA) and identity F1-measure (IDF1) of the proposed algorithm on the MOT17 dataset are 73. 0% and 70. 2%, respectively, compared with the current optimal algorithm ByteTrack, they are improved by 1. 3 percentage points and 2. 1 percentage points, respectively, whereas number of identity switches (IDSW) is reduced by 3. 7%. Meanwhile, the proposed algorithm achieves a tracking speed of 51. 2 frames/s, which meets the real-time requirements of the system.

引用

页数：8

共 20 条

[1] Bewley A, 2016, IEEE IMAGE PROC, P3464, DOI 10.1109/ICIP.2016.7533003
[2] Dendorfer Patrick, 2020, arXiv, DOI DOI 10.48550/ARXIV.2003.09003
[3] Dosovitskiy A, 2021, Arxiv, DOI [arXiv:2010.11929, DOI 10.48550/ARXIV.2010.11929, 10.48550/arXiv.2010.11929]
[4] Ess A, 2008, PROC CVPR IEEE, P1857
[5] Ferrari V, 2018, LECT NOTES COMPUTER, V11209, P379
[6] Ge Z, 2021, Arxiv, DOI [arXiv:2107.08430, DOI 10.48550/ARXIV.2107.08430]
[7] He JB, 2022, Arxiv, DOI [arXiv:2110.13675, 10.48550/arXiv.2110.13675, DOI 10.48550/ARXIV.2110.13675]
[8] Hu J, 2018, PROC CVPR IEEE, P7132, DOI [10.1109/CVPR.2018.00745, 10.1109/TPAMI.2019.2913372]
[9] Milan A, 2016, Arxiv, DOI arXiv:1603.00831
[10] TubeTK: Adopting Tubes to Track Multi-Object in a One-Step Training Model
Pang, Bo
Li, Yizhuo
Zhang, Yifan
Li, Muchen
Lu, Cewu
[J]. 2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, : 6307 - 6317

← 1 2 →