DINO-Tracker: Taming DINO for Self-supervised Point Tracking in a Single Video

被引:0
|
作者
Tumanyan, Narek [1 ]
Singer, Assaf [1 ]
Bagon, Shai [1 ]
Dekel, Tali [1 ]
机构
[1] Weizmann Inst Sci, Rehovot, Israel
来源
关键词
D O I
10.1007/978-3-031-73347-5_21
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present DINO-Tracker - a new framework for long-term dense tracking in video. The pillar of our approach is combining test-time training on a single video, with the powerful localized semantic features learned by a pre-trained DINO-ViT model. Specifically, our framework simultaneously adopts DINO's features to fit to the motion observations of the test video, while training a tracker that directly leverages the refined features. The entire framework is trained end-to-end using a combination of self-supervised losses, and regularization that allows us to retain and benefit from DINO's semantic prior. Extensive evaluation demonstrates that our method achieves state-of-the-art results on known benchmarks. DINO-tracker significantly outperforms self-supervised methods and is competitive with state-of-the-art supervised trackers, while outperforming them in challenging cases of tracking under long-term occlusions.
引用
收藏
页码:367 / 385
页数:19
相关论文
共 50 条
  • [1] Temporal DINO: A Self-supervised Video Strategy to Enhance Action Prediction
    Teeti, Izzeddin
    Bhargav, Rongali Sai
    Singh, Vivek
    Bradley, Andrew
    Banerjee, Biplab
    Cuzzolin, Fabio
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS, ICCVW, 2023, : 3273 - 3283
  • [2] MS-DINO: Masked Self-Supervised Distributed Learning Using Vision Transformer
    Park, Sangjoon
    Lee, Ik Jae
    Kim, Jun Won
    Ye, Jong Chul
    IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2024, 28 (10) : 6180 - 6192
  • [3] Breaking Barriers with Enhanced DINO Framework and Score Normalization to Self-supervised Speaker Verification
    Wan, Xianmei
    Zhan, Xiaosi
    Li, Na
    Liao, Guihua
    PROCEEDINGS OF THE 8TH INTERNATIONAL CONFERENCE ON DIGITAL SIGNAL PROCESSING, ICDSP 2024, 2024, : 158 - 164
  • [4] IMPROVING DINO-BASED SELF-SUPERVISED SPEAKER VERIFICATION WITH PROGRESSIVE CLUSTER-AWARE TRAINING
    Han, Bing
    Huang, Wen
    Chen, Zhengyang
    Qian, Yanmin
    2023 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING WORKSHOPS, ICASSPW, 2023,
  • [5] Self-Supervised Learning With Cluster-Aware-DINO for High-Performance Robust Speaker Verification
    Han, Bing
    Chen, Zhengyang
    Qian, Yanmin
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 529 - 541
  • [6] PIPsUS: Self-supervised Point Tracking in Ultrasound
    Chen, Wanwen
    Schmidt, Adam
    Prisman, Eitan
    Salcudean, Septimiu E.
    SIMPLIFYING MEDICAL ULTRASOUND, ASMUS 2024, 2025, 15186 : 47 - 57
  • [7] C3-DINO: Joint Contrastive and Non-Contrastive Self-Supervised Learning for Speaker Verification
    Zhang, Chunlei
    Yu, Dong
    IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2022, 16 (06) : 1273 - 1283
  • [8] Self-supervised Video Transformer
    Ranasinghe, Kanchana
    Naseer, Muzammal
    Khan, Salman
    Khan, Fahad Shahbaz
    Ryoo, Michael S.
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 2864 - 2874
  • [9] Self-supervised Learning Enhanced Ultrasound Video Thyroid Nodule Tracking
    Liu, Ningtao
    Fenster, Aaron
    Tessier, David
    Gou, Shuiping
    Chong, Jaron
    MEDICAL IMAGING 2023, 2023, 12464
  • [10] Self-supervised Learning Enhanced Ultrasound Video Thyroid Nodule Tracking
    Liu, Ningtao
    Fenster, Aaron
    Tessier, David
    Gou, Shuiping
    Chong, Jaron
    Progress in Biomedical Optics and Imaging - Proceedings of SPIE, 2023, 12464