PVT plus plus : A Simple End-to-End Latency-Aware Visual Tracking Framework

被引：1

作者：

Li, Bowen ^{[1
]}

Huang, Ziyuan ^{[2
]}

Ye, Junjie ^{[3
]}

Li, Yiming ^{[4
]}

Scherer, Sebastian ^{[1
]}

Zhao, Hang ^{[5
]}

Fu, Changhong ^{[3
]}

机构：

[1] Carnegie Mellon Univ, Pittsburgh, PA USA

[2] Natl Univ Singapore, Singapore, Singapore

[3] Tongji Univ, Shanghai, Peoples R China

[4] NYU, New York, NY USA

[5] Tsinghua Univ, Beijing, Peoples R China

来源：

2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023) | 2023年

关键词：

D O I：

10.1109/ICCV51070.2023.00918

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Visual object tracking is essential to intelligent robots. Most existing approaches have ignored the online latency that can cause severe performance degradation during realworld processing. Especially for unmanned aerial vehicles ( UAVs), where robust tracking is more challenging and onboard computation is limited, the latency issue can be fatal. In this work, we present a simple framework for end-to-end latency-aware tracking, i.e., end-to-end predictive visual tracking (PVT++). Unlike existing solutions that naively append Kalman Filters after trackers, PVT++ can be jointly optimized, so that it takes not only motion information but can also leverage the rich visual knowledge in most pretrained tracker models for robust prediction. Besides, to bridge the training-evaluation domain gap, we propose a relative motion factor, empowering PVT++ to generalize to the challenging and complex UAV tracking scenes. These careful designs have made the small-capacity lightweight PVT++ a widely effective solution. Additionally, this work presents an extended latency-aware evaluation benchmark for assessing an any-speed tracker in the online setting. Empirical results on a robotic platform from the aerial perspective show that PVT++ can achieve significant performance gain on various trackers and exhibit higher accuracy than prior solutions, largely mitigating the degradation brought by latency. Our code is public at https: //github.com/Jaraxxus-Me/PVT_pp.git.

引用

页码：9972 / 9982

页数：11

共 50 条

[31] Context-Aware Mathematical Expression Recognition: An End-to-End Framework and A Benchmark
He, Wenhao
Luo, Yuxuan
Yin, Fei
Hu, Han
Han, Junyu
Ding, Errui
Liu, Cheng-Lin
2016 23RD INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2016, : 3246 - 3251
[32] A Fine-Grained End-to-End Latency Optimization Framework for Wireless Collaborative Inference
Mu, Lei
Li, Zhonghui
Xiao, Wei
Zhang, Ruilin
Wang, Peng
Liu, Tao
Min, Geyong
Li, Keqin
IEEE INTERNET OF THINGS JOURNAL, 2024, 11 (04): : 5840 - 5853
[33] PointNet plus plus Grasping: Learning An End-to-end Spatial Grasp Generation Algorithm from Sparse Point Clouds
Ni, Peiyuan
Zhang, Wenguang
Zhu, Xiaoxiao
Cao, Qixin
2020 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), 2020, : 3619 - 3625
[34] SAVi plus plus : Towards End-to-End Object-Centric Learning from Real-World Videos
Elsayed, Gamaleldin F.
Mahendran, Aravindh
van Steenkiste, Sjoerd
Greff, Klaus
Mozer, Michael C.
Kipf, Thomas
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
[35] BottleNet plus plus : An End-to-End Approach for Feature Compression in Device-Edge Co-Inference Systems
Shao, Jiawei
Zhang, Jun
2020 IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS WORKSHOPS (ICC WORKSHOPS), 2020,
[36] Enhancing continuous control of mobile robots for end-to-end visual active tracking
Devo, Alessandro
Dionigi, Alberto
Costante, Gabriele
ROBOTICS AND AUTONOMOUS SYSTEMS, 2021, 142
[37] Hierarchical convolutional features for end-to-end representation-based visual tracking
Suguo Zhu
Zhenying Fang
Fei Gao
Machine Vision and Applications, 2018, 29 : 955 - 963
[38] Hierarchical convolutional features for end-to-end representation-based visual tracking
Zhu, Suguo
Fang, Zhenying
Gao, Fei
MACHINE VISION AND APPLICATIONS, 2018, 29 (06) : 955 - 963
[39] End-to-end CNN plus LSTM deep learning approach for bearing fault diagnosis
Khorram, Amin
Khalooei, Mohammad
Rezghi, Mansoor
APPLIED INTELLIGENCE, 2021, 51 (02) : 736 - 751
[40] StreamVoice plus : Evolving Into End-to-End Streaming Zero-Shot Voice Conversion
Wang, Zhichao
Chen, Yuanzhe
Wang, Xinsheng
Xie, Lei
Wang, Yuping
IEEE SIGNAL PROCESSING LETTERS, 2024, 31 : 3000 - 3004

← 1 2 3 4 5 →