PVT plus plus : A Simple End-to-End Latency-Aware Visual Tracking Framework

被引:1
|
作者
Li, Bowen [1 ]
Huang, Ziyuan [2 ]
Ye, Junjie [3 ]
Li, Yiming [4 ]
Scherer, Sebastian [1 ]
Zhao, Hang [5 ]
Fu, Changhong [3 ]
机构
[1] Carnegie Mellon Univ, Pittsburgh, PA USA
[2] Natl Univ Singapore, Singapore, Singapore
[3] Tongji Univ, Shanghai, Peoples R China
[4] NYU, New York, NY USA
[5] Tsinghua Univ, Beijing, Peoples R China
来源
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023) | 2023年
关键词
D O I
10.1109/ICCV51070.2023.00918
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Visual object tracking is essential to intelligent robots. Most existing approaches have ignored the online latency that can cause severe performance degradation during realworld processing. Especially for unmanned aerial vehicles ( UAVs), where robust tracking is more challenging and onboard computation is limited, the latency issue can be fatal. In this work, we present a simple framework for end-to-end latency-aware tracking, i.e., end-to-end predictive visual tracking (PVT++). Unlike existing solutions that naively append Kalman Filters after trackers, PVT++ can be jointly optimized, so that it takes not only motion information but can also leverage the rich visual knowledge in most pretrained tracker models for robust prediction. Besides, to bridge the training-evaluation domain gap, we propose a relative motion factor, empowering PVT++ to generalize to the challenging and complex UAV tracking scenes. These careful designs have made the small-capacity lightweight PVT++ a widely effective solution. Additionally, this work presents an extended latency-aware evaluation benchmark for assessing an any-speed tracker in the online setting. Empirical results on a robotic platform from the aerial perspective show that PVT++ can achieve significant performance gain on various trackers and exhibit higher accuracy than prior solutions, largely mitigating the degradation brought by latency. Our code is public at https: //github.com/Jaraxxus-Me/PVT_pp.git.
引用
收藏
页码:9972 / 9982
页数:11
相关论文
共 50 条
  • [1] PVT++: A Simple End-to-End Latency-Aware Visual Tracking Framework
    Li, Bowen
    Huang, Ziyuan
    Ye, Junjie
    Li, Yiming
    Scherer, Sebastian
    Zhao, Hang
    Fu, Changhong
    Proceedings of the IEEE International Conference on Computer Vision, 2023, : 9972 - 9982
  • [2] PVT++: A Simple End-to-End Latency-Aware Visual Tracking Framework
    Li, Bowen
    Huang, Ziyuan
    Ye, Junjie
    Li, Yiming
    Scherer, Sebastian
    Zhao, Hang
    Fu, Changhong
    arXiv, 2022,
  • [3] End-to-end DeepNCC framework for robust visual tracking
    Dai, Kaiheng
    Wang, Yuehuan
    JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2020, 70
  • [4] TransVG plus plus : End-to-End Visual Grounding With Language Conditioned Vision Transformer
    Deng, Jiajun
    Yang, Zhengyuan
    Liu, Daqing
    Chen, Tianlang
    Zhou, Wengang
    Zhang, Yanyong
    Li, Houqiang
    Ouyang, Wanli
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (11) : 13636 - 13652
  • [5] CompenNet plus plus : End-to-end Full Projector Compensation
    Huang, Bingyao
    Ling, Haibin
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 7164 - 7173
  • [6] LARMix plus plus : Latency-Aware Routing in Mix Networks with Free Routes Topology
    Rahimi, Mandi
    CRYPTOLOGY AND NETWORK SECURITY, CANS 2024, PT I, 2025, 14905 : 187 - 211
  • [7] End-to-End NeuralWord Alignment Outperforms GIZA plus
    Zenkel, Thomas
    Wuebker, Joern
    DeNero, John
    58TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2020), 2020, : 1605 - 1617
  • [8] A Framework for End-to-End Latency Measurements in a Satellite Network Environment
    Bisu, Anas A.
    Purvis, Alan
    Brigham, Katharine
    Sun, Hongjian
    2018 IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS (ICC), 2018,
  • [9] End-to-end deep metric network for visual tracking
    Tian, Shengjing
    Shen, Shuwei
    Tian, Guoqiang
    Liu, Xiuping
    Yin, Baocai
    VISUAL COMPUTER, 2020, 36 (06): : 1219 - 1232
  • [10] End-to-end deep metric network for visual tracking
    Shengjing Tian
    Shuwei Shen
    Guoqiang Tian
    Xiuping Liu
    Baocai Yin
    The Visual Computer, 2020, 36 : 1219 - 1232