DINO-Tracker: Taming DINO for Self-supervised Point Tracking in a Single Video

被引:0
|
作者
Tumanyan, Narek [1 ]
Singer, Assaf [1 ]
Bagon, Shai [1 ]
Dekel, Tali [1 ]
机构
[1] Weizmann Inst Sci, Rehovot, Israel
来源
关键词
D O I
10.1007/978-3-031-73347-5_21
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present DINO-Tracker - a new framework for long-term dense tracking in video. The pillar of our approach is combining test-time training on a single video, with the powerful localized semantic features learned by a pre-trained DINO-ViT model. Specifically, our framework simultaneously adopts DINO's features to fit to the motion observations of the test video, while training a tracker that directly leverages the refined features. The entire framework is trained end-to-end using a combination of self-supervised losses, and regularization that allows us to retain and benefit from DINO's semantic prior. Extensive evaluation demonstrates that our method achieves state-of-the-art results on known benchmarks. DINO-tracker significantly outperforms self-supervised methods and is competitive with state-of-the-art supervised trackers, while outperforming them in challenging cases of tracking under long-term occlusions.
引用
收藏
页码:367 / 385
页数:19
相关论文
共 50 条
  • [11] DINO-VITS: Data-Efficient Zero-Shot TTS with Self-Supervised Speaker Verification Loss for Noise Robustness
    Pankov, Vikentii
    Pronina, Valeria
    Kuzmin, Alexander
    Borisov, Maksim
    Usoltsev, Nikita
    Zeng, Xingshan
    Golubkov, Alexander
    Ermolenko, Nikolai
    Shirshova, Aleksandra
    Matveeva, Yulia
    INTERSPEECH 2024, 2024, : 697 - 701
  • [12] A novel laser stripe key point tracker based on self-supervised learning and improved KCF for robotic welding seam tracking
    Xiao, Runquan
    Cao, Qixin
    Chen, Shanben
    JOURNAL OF MANUFACTURING PROCESSES, 2024, 127 : 660 - 670
  • [13] MAST: A Memory-Augmented Self-Supervised Tracker
    Lai, Zihang
    Lu, Erika
    Xie, Weidi
    2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, : 6478 - 6487
  • [14] Self-Supervised Deep Correlation Tracking
    Yuan, Di
    Chang, Xiaojun
    Huang, Po-Yao
    Liu, Qiao
    He, Zhenyu
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2021, 30 : 976 - 985
  • [15] An Effective Representation Learning Approach: The Integrated Self-Supervised Pre-Training Models of StyleGAN2-ADA and DINO for Colon Polyp Images
    Kim, Jong-Yeup
    Tangriberganov, Gayrat
    Jung, Woochul
    Kim, Dae Sung
    Koo, Hoon Sup
    Lee, Suehyun
    Kim, Sun Moon
    IEEE ACCESS, 2023, 11 : 143628 - 143634
  • [16] Self-supervised Learning for Endoscopic Video Analysis
    Hirsch, Roy
    Caron, Mathilde
    Cohen, Regev
    Livne, Amir
    Shapiro, Ron
    Golany, Tomer
    Goldenberg, Roman
    Freedman, Daniel
    Rivlin, Ehud
    MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION, MICCAI 2023, PT V, 2023, 14224 : 569 - 578
  • [17] Self-Supervised Video Desmoking for Laparoscopic Surgery
    Wul, Renlong
    Zhang, Zhilu
    Zhang, Shuohao
    Guo, Longfei
    Chen, Haobin
    Zhang, Lei
    Chen, Hao
    Zu, Wangmeng
    COMPUTER VISION - ECCV 2024, PT LXXII, 2025, 15130 : 307 - 324
  • [18] SELF-SUPERVISED REPRESENTATION LEARNING FOR ULTRASOUND VIDEO
    Jiao, Jianbo
    Droste, Richard
    Drukker, Lior
    Papageorghiou, Aris T.
    Noble, J. Alison
    2020 IEEE 17TH INTERNATIONAL SYMPOSIUM ON BIOMEDICAL IMAGING (ISBI 2020), 2020, : 1847 - 1850
  • [19] Self-supervised Amodal Video Object Segmentation
    Yao, Jian
    Hong, Yuxin
    Wang, Chiyu
    Xiao, Tianjun
    He, Tong
    Locatello, Francesco
    Wipf, David
    Fu, Yanwei
    Zhang, Zheng
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [20] Federated Self-supervised Learning for Video Understanding
    Rehman, Yasar Abbas Ur
    Gao, Yan
    Shen, Jiajun
    de Gusmao, Pedro Porto Buarque
    Lane, Nicholas
    COMPUTER VISION, ECCV 2022, PT XXXI, 2022, 13691 : 506 - 522