DINO-Tracker: Taming DINO for Self-supervised Point Tracking in a Single Video

被引：0

作者：

Tumanyan, Narek ^{[1
]}

Singer, Assaf ^{[1
]}

Bagon, Shai ^{[1
]}

Dekel, Tali ^{[1
]}

机构：

[1] Weizmann Inst Sci, Rehovot, Israel

来源：

COMPUTER VISION - ECCV 2024, PT XXVI | 2025年 / 15084卷

关键词：

D O I：

10.1007/978-3-031-73347-5_21

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We present DINO-Tracker - a new framework for long-term dense tracking in video. The pillar of our approach is combining test-time training on a single video, with the powerful localized semantic features learned by a pre-trained DINO-ViT model. Specifically, our framework simultaneously adopts DINO's features to fit to the motion observations of the test video, while training a tracker that directly leverages the refined features. The entire framework is trained end-to-end using a combination of self-supervised losses, and regularization that allows us to retain and benefit from DINO's semantic prior. Extensive evaluation demonstrates that our method achieves state-of-the-art results on known benchmarks. DINO-tracker significantly outperforms self-supervised methods and is competitive with state-of-the-art supervised trackers, while outperforming them in challenging cases of tracking under long-term occlusions.

引用

页码：367 / 385

页数：19

共 50 条

[41] Self-supervised Sparse Representation for Video Anomaly Detection
Wu, Jhih-Ciang
Hsieh, He-Yen
Chen, Ding-Jie
Fuh, Chiou-Shann
Liu, Tyng-Luh
COMPUTER VISION, ECCV 2022, PT XIII, 2022, 13673 : 729 - 745
[42] Broaden Your Views for Self-Supervised Video Learning
Recasens, Adria
Luc, Pauline
Alayrac, Jean-Baptiste
Wang, Luyu
Strub, Florian
Tallec, Corentin
Malinowski, Mateusz
Patraaucean, Viorica
Altche, Florent
Valko, Michal
Grill, Jean-Bastien
van den Oord, Aaron
Zisserman, Andrew
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 1235 - 1245
[43] Self-Supervised Generation of Spatial Audio for 360° Video
Morgado, Pedro
Vasconcelos, Nuno
Langlois, Timothy
Wang, Oliver
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
[44] Cascaded Siamese Self-supervised Audio to Video GAN
Aldausari, Nuha
Sowmya, Arcot
Marcus, Nadine
Mohammadi, Gelareh
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2022, 2022, : 4690 - 4699
[45] Self-supervised Video Object Segmentation by Motion Grouping
Yang, Charig
Lamdouar, Hala
Lu, Erika
Zisserman, Andrew
Xie, Weidi
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 7157 - 7168
[46] Self-supervised learning of class embeddings from video
Wiles, Olivia
Koepke, A. Sophia
Zisserman, Andrew
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW), 2019, : 3019 - 3027
[47] Video Motion Perception for Self-supervised Representation Learning
Li, Wei
Luo, Dezhao
Fang, Bo
Li, Xiaoni
Zhou, Yu
Wang, Weiping
ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2022, PT IV, 2022, 13532 : 508 - 520
[48] Self-supervised Moving Vehicle Tracking with Stereo Sound
Gan, Chuang
Zhao, Hang
Chen, Peihao
Cox, David
Torralba, Antonio
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 7052 - 7061
[49] Self-supervised discriminative model prediction for visual tracking
Di Yuan
Gu Geng
Xiu Shu
Qiao Liu
Xiaojun Chang
Zhenyu He
Guangming Shi
Neural Computing and Applications, 2024, 36 : 5153 - 5164
[50] Self-Supervised Video-Centralised Transformer for Video Face Clustering
Wang, Yujiang
Dong, Mingzhi
Shen, Jie
Luo, Yiming
Lin, Yiming
Ma, Pingchuan
Petridis, Stavros
Pantic, Maja
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (11) : 12944 - 12959

← 1 2 3 4 5 →