PVT plus plus : A Simple End-to-End Latency-Aware Visual Tracking Framework

被引:1
|
作者
Li, Bowen [1 ]
Huang, Ziyuan [2 ]
Ye, Junjie [3 ]
Li, Yiming [4 ]
Scherer, Sebastian [1 ]
Zhao, Hang [5 ]
Fu, Changhong [3 ]
机构
[1] Carnegie Mellon Univ, Pittsburgh, PA USA
[2] Natl Univ Singapore, Singapore, Singapore
[3] Tongji Univ, Shanghai, Peoples R China
[4] NYU, New York, NY USA
[5] Tsinghua Univ, Beijing, Peoples R China
来源
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023) | 2023年
关键词
D O I
10.1109/ICCV51070.2023.00918
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Visual object tracking is essential to intelligent robots. Most existing approaches have ignored the online latency that can cause severe performance degradation during realworld processing. Especially for unmanned aerial vehicles ( UAVs), where robust tracking is more challenging and onboard computation is limited, the latency issue can be fatal. In this work, we present a simple framework for end-to-end latency-aware tracking, i.e., end-to-end predictive visual tracking (PVT++). Unlike existing solutions that naively append Kalman Filters after trackers, PVT++ can be jointly optimized, so that it takes not only motion information but can also leverage the rich visual knowledge in most pretrained tracker models for robust prediction. Besides, to bridge the training-evaluation domain gap, we propose a relative motion factor, empowering PVT++ to generalize to the challenging and complex UAV tracking scenes. These careful designs have made the small-capacity lightweight PVT++ a widely effective solution. Additionally, this work presents an extended latency-aware evaluation benchmark for assessing an any-speed tracker in the online setting. Empirical results on a robotic platform from the aerial perspective show that PVT++ can achieve significant performance gain on various trackers and exhibit higher accuracy than prior solutions, largely mitigating the degradation brought by latency. Our code is public at https: //github.com/Jaraxxus-Me/PVT_pp.git.
引用
收藏
页码:9972 / 9982
页数:11
相关论文
共 50 条
  • [41] BGRP plus: Quiet grafting mechanisms for providing a scalable end-to-end QoS solution
    Nikolouzou, E
    Sampatakos, P
    Dimopoulou, L
    Salsano, S
    Venieris, IS
    ARCHITECTURES FOR QUALITY OF SERVICE IN THE INTERNET, 2003, 2698 : 177 - 188
  • [42] End-To-End Real-Time Visual Perception Framework for Construction Automation
    Vohra, Mohit
    Kumar, Ashish
    Prakash, Ravi
    Behera, Laxmidhar
    2021 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2021, : 3485 - 3490
  • [43] End-to-End Visual Grounding Framework for Multimodal NER in Social Media Posts
    Lyu, Yifan
    Hu, Jiapei
    Xue, Yun
    Cai, Qianhua
    IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, 2024, : 1 - 11
  • [44] LatencySmasher: A Software-Defined Networking-Based Framework for End-to-End Latency Optimization
    Rahouti, Mohamed
    Xiong, Kaiqi
    Xin, Yufeng
    Ghani, Nasir
    PROCEEDINGS OF THE IEEE LCN: 2019 44TH ANNUAL IEEE CONFERENCE ON LOCAL COMPUTER NETWORKS (LCN 2019), 2019, : 202 - 209
  • [45] D-VAT: End-to-End Visual Active Tracking for Micro Aerial Vehicles
    Dionigi, Alberto
    Felicioni, Simone
    Leomanni, Mirko
    Costante, Gabriele
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2024, 9 (06): : 5046 - 5053
  • [46] E-VAT: An Asymmetric End-to-End Approach to Visual Active Exploration and Tracking
    Dionigi, Alberto
    Devo, Alessandro
    Guiducci, Leonardo
    Costante, Gabriele
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2022, 7 (02) : 4259 - 4266
  • [47] An end-to-end tracking framework via multi-view and temporal feature aggregation
    Yang, Yihan
    Xu, Ming
    Ralph, Jason F.
    Ling, Yuchen
    Pan, Xiaonan
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2024, 249
  • [48] An End-to-End Framework of Road User Detection, Tracking, and Prediction from Monocular Images
    Cheng, Hao
    Liu, Mengmeng
    Chen, Lin
    2023 IEEE 26TH INTERNATIONAL CONFERENCE ON INTELLIGENT TRANSPORTATION SYSTEMS, ITSC, 2023, : 2178 - 2185
  • [49] A Vision-Based End-to-End Reinforcement Learning Framework for Drone Target Tracking
    Zhao, Xun
    Huang, Xinjian
    Cheng, Jianheng
    Xia, Zhendong
    Tu, Zhiheng
    DRONES, 2024, 8 (11)
  • [50] Looking through the Eye of the Mouse: A Simple Method for Measuring End-to-end Latency using an Optical Mouse
    Casiez, Gery
    Conversy, Stephane
    Falce, Matthieu
    Huot, Stephane
    Roussel, Nicolas
    UIST'15: PROCEEDINGS OF THE 28TH ANNUAL ACM SYMPOSIUM ON USER INTERFACE SOFTWARE AND TECHNOLOGY, 2015, : 629 - 636