DINO-Tracker: Taming DINO for Self-supervised Point Tracking in a Single Video

被引：0

作者：

Tumanyan, Narek ^{[1
]}

Singer, Assaf ^{[1
]}

Bagon, Shai ^{[1
]}

Dekel, Tali ^{[1
]}

机构：

[1] Weizmann Inst Sci, Rehovot, Israel

来源：

COMPUTER VISION - ECCV 2024, PT XXVI | 2025年 / 15084卷

关键词：

D O I：

10.1007/978-3-031-73347-5_21

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We present DINO-Tracker - a new framework for long-term dense tracking in video. The pillar of our approach is combining test-time training on a single video, with the powerful localized semantic features learned by a pre-trained DINO-ViT model. Specifically, our framework simultaneously adopts DINO's features to fit to the motion observations of the test video, while training a tracker that directly leverages the refined features. The entire framework is trained end-to-end using a combination of self-supervised losses, and regularization that allows us to retain and benefit from DINO's semantic prior. Extensive evaluation demonstrates that our method achieves state-of-the-art results on known benchmarks. DINO-tracker significantly outperforms self-supervised methods and is competitive with state-of-the-art supervised trackers, while outperforming them in challenging cases of tracking under long-term occlusions.

引用

页码：367 / 385

页数：19

共 50 条

[21] Self-supervised learning for robust video indexing
Ewerth, Ralph
Freisleben, Bernd
2006 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO - ICME 2006, VOLS 1-5, PROCEEDINGS, 2006, : 1749 - +
[22] Self-Supervised Video Representation Learning by Video Incoherence Detection
Cao, Haozhi
Xu, Yuecong
Mao, Kezhi
Xie, Lihua
Yin, Jianxiong
See, Simon
Xu, Qianwen
Yang, Jianfei
IEEE TRANSACTIONS ON CYBERNETICS, 2024, 54 (06) : 3810 - 3822
[23] Self-supervised indoor scene point cloud completion from a single panorama
Li, Tong
Zhang, Zhaoxuan
Wang, Yuxin
Cui, Yan
Li, Yuqi
Zhou, Dongsheng
Yin, Baocai
Yang, Xin
VISUAL COMPUTER, 2025, 41 (03): : 1891 - 1905
[24] Tracking-by-Self Detection: A Self-supervised Framework for Multiple Animal Tracking
Narayan, C. B. Dev
Rahman, Fayaz
Ullah, Mohib
Cheikh, Faouzi Alaya
Imran, Ali Shariq
Coello, Christopher
Nordbo, Oyvind
Kumar, G. Santhosh
Nair, Madhu S.
ARTIFICIAL INTELLIGENCE APPLICATIONS AND INNOVATIONS, AIAI 2023, PT I, 2023, 675 : 561 - 572
[25] Self-Supervised Camera Self-Calibration from Video
Fang, Jiading
Vasiljevic, Igor
Guizilini, Vitor
Ambrus, Rares
Shakhnarovich, Greg
Gaidon, Adrien
Walter, Matthew R.
2022 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION, ICRA 2022, 2022, : 8468 - 8475
[26] The online scene-adaptive tracker based on self-supervised learning
Xiaoyu Chen
Mingyang Chen
Jinru Hang
Fengchen He
Wei Qi
Jing Han
Multimedia Tools and Applications, 2023, 82 : 15695 - 15713
[27] The online scene-adaptive tracker based on self-supervised learning
Chen, Xiaoyu
Chen, Mingyang
Hang, Jinru
He, Fengchen
Qi, Wei
Han, Jing
MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (10) : 15695 - 15713
[28] DINO-CXR: A Self Supervised Method Based on Vision Transformer for Chest X-Ray Classification
Shakouri, Mohammadreza
Iranmanesh, Fatemeh
Eftekhari, Mahdi
ADVANCES IN VISUAL COMPUTING, ISVC 2023, PT II, 2023, 14362 : 320 - 331
[29] Video Face Clustering with Self-Supervised Representation Learning
Sharma V.
Tapaswi M.
Saquib Sarfraz M.
Stiefelhagen R.
IEEE Transactions on Biometrics, Behavior, and Identity Science, 2020, 2 (02): : 145 - 157
[30] Self-Supervised Learning for Action Recognition by Video Denoising
Thi Thu Trang Phung
Thi Hong Thu Ma
Van Truong Nguyen
Duc Quang Vu
2021 RIVF INTERNATIONAL CONFERENCE ON COMPUTING AND COMMUNICATION TECHNOLOGIES (RIVF 2021), 2021, : 76 - 81

← 1 2 3 4 5 →