PVT plus plus : A Simple End-to-End Latency-Aware Visual Tracking Framework

被引:1
|
作者
Li, Bowen [1 ]
Huang, Ziyuan [2 ]
Ye, Junjie [3 ]
Li, Yiming [4 ]
Scherer, Sebastian [1 ]
Zhao, Hang [5 ]
Fu, Changhong [3 ]
机构
[1] Carnegie Mellon Univ, Pittsburgh, PA USA
[2] Natl Univ Singapore, Singapore, Singapore
[3] Tongji Univ, Shanghai, Peoples R China
[4] NYU, New York, NY USA
[5] Tsinghua Univ, Beijing, Peoples R China
来源
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023) | 2023年
关键词
D O I
10.1109/ICCV51070.2023.00918
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Visual object tracking is essential to intelligent robots. Most existing approaches have ignored the online latency that can cause severe performance degradation during realworld processing. Especially for unmanned aerial vehicles ( UAVs), where robust tracking is more challenging and onboard computation is limited, the latency issue can be fatal. In this work, we present a simple framework for end-to-end latency-aware tracking, i.e., end-to-end predictive visual tracking (PVT++). Unlike existing solutions that naively append Kalman Filters after trackers, PVT++ can be jointly optimized, so that it takes not only motion information but can also leverage the rich visual knowledge in most pretrained tracker models for robust prediction. Besides, to bridge the training-evaluation domain gap, we propose a relative motion factor, empowering PVT++ to generalize to the challenging and complex UAV tracking scenes. These careful designs have made the small-capacity lightweight PVT++ a widely effective solution. Additionally, this work presents an extended latency-aware evaluation benchmark for assessing an any-speed tracker in the online setting. Empirical results on a robotic platform from the aerial perspective show that PVT++ can achieve significant performance gain on various trackers and exhibit higher accuracy than prior solutions, largely mitigating the degradation brought by latency. Our code is public at https: //github.com/Jaraxxus-Me/PVT_pp.git.
引用
收藏
页码:9972 / 9982
页数:11
相关论文
共 50 条
  • [31] Context-Aware Mathematical Expression Recognition: An End-to-End Framework and A Benchmark
    He, Wenhao
    Luo, Yuxuan
    Yin, Fei
    Hu, Han
    Han, Junyu
    Ding, Errui
    Liu, Cheng-Lin
    2016 23RD INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2016, : 3246 - 3251
  • [32] A Fine-Grained End-to-End Latency Optimization Framework for Wireless Collaborative Inference
    Mu, Lei
    Li, Zhonghui
    Xiao, Wei
    Zhang, Ruilin
    Wang, Peng
    Liu, Tao
    Min, Geyong
    Li, Keqin
    IEEE INTERNET OF THINGS JOURNAL, 2024, 11 (04): : 5840 - 5853
  • [33] PointNet plus plus Grasping: Learning An End-to-end Spatial Grasp Generation Algorithm from Sparse Point Clouds
    Ni, Peiyuan
    Zhang, Wenguang
    Zhu, Xiaoxiao
    Cao, Qixin
    2020 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), 2020, : 3619 - 3625
  • [34] SAVi plus plus : Towards End-to-End Object-Centric Learning from Real-World Videos
    Elsayed, Gamaleldin F.
    Mahendran, Aravindh
    van Steenkiste, Sjoerd
    Greff, Klaus
    Mozer, Michael C.
    Kipf, Thomas
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [35] BottleNet plus plus : An End-to-End Approach for Feature Compression in Device-Edge Co-Inference Systems
    Shao, Jiawei
    Zhang, Jun
    2020 IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS WORKSHOPS (ICC WORKSHOPS), 2020,
  • [36] Enhancing continuous control of mobile robots for end-to-end visual active tracking
    Devo, Alessandro
    Dionigi, Alberto
    Costante, Gabriele
    ROBOTICS AND AUTONOMOUS SYSTEMS, 2021, 142
  • [37] Hierarchical convolutional features for end-to-end representation-based visual tracking
    Suguo Zhu
    Zhenying Fang
    Fei Gao
    Machine Vision and Applications, 2018, 29 : 955 - 963
  • [38] Hierarchical convolutional features for end-to-end representation-based visual tracking
    Zhu, Suguo
    Fang, Zhenying
    Gao, Fei
    MACHINE VISION AND APPLICATIONS, 2018, 29 (06) : 955 - 963
  • [39] End-to-end CNN plus LSTM deep learning approach for bearing fault diagnosis
    Khorram, Amin
    Khalooei, Mohammad
    Rezghi, Mansoor
    APPLIED INTELLIGENCE, 2021, 51 (02) : 736 - 751
  • [40] StreamVoice plus : Evolving Into End-to-End Streaming Zero-Shot Voice Conversion
    Wang, Zhichao
    Chen, Yuanzhe
    Wang, Xinsheng
    Xie, Lei
    Wang, Yuping
    IEEE SIGNAL PROCESSING LETTERS, 2024, 31 : 3000 - 3004