DINO-Tracker: Taming DINO for Self-supervised Point Tracking in a Single Video

被引:0
|
作者
Tumanyan, Narek [1 ]
Singer, Assaf [1 ]
Bagon, Shai [1 ]
Dekel, Tali [1 ]
机构
[1] Weizmann Inst Sci, Rehovot, Israel
来源
关键词
D O I
10.1007/978-3-031-73347-5_21
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present DINO-Tracker - a new framework for long-term dense tracking in video. The pillar of our approach is combining test-time training on a single video, with the powerful localized semantic features learned by a pre-trained DINO-ViT model. Specifically, our framework simultaneously adopts DINO's features to fit to the motion observations of the test video, while training a tracker that directly leverages the refined features. The entire framework is trained end-to-end using a combination of self-supervised losses, and regularization that allows us to retain and benefit from DINO's semantic prior. Extensive evaluation demonstrates that our method achieves state-of-the-art results on known benchmarks. DINO-tracker significantly outperforms self-supervised methods and is competitive with state-of-the-art supervised trackers, while outperforming them in challenging cases of tracking under long-term occlusions.
引用
收藏
页码:367 / 385
页数:19
相关论文
共 50 条
  • [21] Self-supervised learning for robust video indexing
    Ewerth, Ralph
    Freisleben, Bernd
    2006 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO - ICME 2006, VOLS 1-5, PROCEEDINGS, 2006, : 1749 - +
  • [22] Self-Supervised Video Representation Learning by Video Incoherence Detection
    Cao, Haozhi
    Xu, Yuecong
    Mao, Kezhi
    Xie, Lihua
    Yin, Jianxiong
    See, Simon
    Xu, Qianwen
    Yang, Jianfei
    IEEE TRANSACTIONS ON CYBERNETICS, 2024, 54 (06) : 3810 - 3822
  • [23] Self-supervised indoor scene point cloud completion from a single panorama
    Li, Tong
    Zhang, Zhaoxuan
    Wang, Yuxin
    Cui, Yan
    Li, Yuqi
    Zhou, Dongsheng
    Yin, Baocai
    Yang, Xin
    VISUAL COMPUTER, 2025, 41 (03): : 1891 - 1905
  • [24] Tracking-by-Self Detection: A Self-supervised Framework for Multiple Animal Tracking
    Narayan, C. B. Dev
    Rahman, Fayaz
    Ullah, Mohib
    Cheikh, Faouzi Alaya
    Imran, Ali Shariq
    Coello, Christopher
    Nordbo, Oyvind
    Kumar, G. Santhosh
    Nair, Madhu S.
    ARTIFICIAL INTELLIGENCE APPLICATIONS AND INNOVATIONS, AIAI 2023, PT I, 2023, 675 : 561 - 572
  • [25] Self-Supervised Camera Self-Calibration from Video
    Fang, Jiading
    Vasiljevic, Igor
    Guizilini, Vitor
    Ambrus, Rares
    Shakhnarovich, Greg
    Gaidon, Adrien
    Walter, Matthew R.
    2022 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION, ICRA 2022, 2022, : 8468 - 8475
  • [26] The online scene-adaptive tracker based on self-supervised learning
    Xiaoyu Chen
    Mingyang Chen
    Jinru Hang
    Fengchen He
    Wei Qi
    Jing Han
    Multimedia Tools and Applications, 2023, 82 : 15695 - 15713
  • [27] The online scene-adaptive tracker based on self-supervised learning
    Chen, Xiaoyu
    Chen, Mingyang
    Hang, Jinru
    He, Fengchen
    Qi, Wei
    Han, Jing
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (10) : 15695 - 15713
  • [28] DINO-CXR: A Self Supervised Method Based on Vision Transformer for Chest X-Ray Classification
    Shakouri, Mohammadreza
    Iranmanesh, Fatemeh
    Eftekhari, Mahdi
    ADVANCES IN VISUAL COMPUTING, ISVC 2023, PT II, 2023, 14362 : 320 - 331
  • [29] Video Face Clustering with Self-Supervised Representation Learning
    Sharma V.
    Tapaswi M.
    Saquib Sarfraz M.
    Stiefelhagen R.
    IEEE Transactions on Biometrics, Behavior, and Identity Science, 2020, 2 (02): : 145 - 157
  • [30] Self-Supervised Learning for Action Recognition by Video Denoising
    Thi Thu Trang Phung
    Thi Hong Thu Ma
    Van Truong Nguyen
    Duc Quang Vu
    2021 RIVF INTERNATIONAL CONFERENCE ON COMPUTING AND COMMUNICATION TECHNOLOGIES (RIVF 2021), 2021, : 76 - 81