DINO-Tracker: Taming DINO for Self-supervised Point Tracking in a Single Video

被引:0
|
作者
Tumanyan, Narek [1 ]
Singer, Assaf [1 ]
Bagon, Shai [1 ]
Dekel, Tali [1 ]
机构
[1] Weizmann Inst Sci, Rehovot, Israel
来源
关键词
D O I
10.1007/978-3-031-73347-5_21
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present DINO-Tracker - a new framework for long-term dense tracking in video. The pillar of our approach is combining test-time training on a single video, with the powerful localized semantic features learned by a pre-trained DINO-ViT model. Specifically, our framework simultaneously adopts DINO's features to fit to the motion observations of the test video, while training a tracker that directly leverages the refined features. The entire framework is trained end-to-end using a combination of self-supervised losses, and regularization that allows us to retain and benefit from DINO's semantic prior. Extensive evaluation demonstrates that our method achieves state-of-the-art results on known benchmarks. DINO-tracker significantly outperforms self-supervised methods and is competitive with state-of-the-art supervised trackers, while outperforming them in challenging cases of tracking under long-term occlusions.
引用
收藏
页码:367 / 385
页数:19
相关论文
共 50 条
  • [41] Self-supervised Sparse Representation for Video Anomaly Detection
    Wu, Jhih-Ciang
    Hsieh, He-Yen
    Chen, Ding-Jie
    Fuh, Chiou-Shann
    Liu, Tyng-Luh
    COMPUTER VISION, ECCV 2022, PT XIII, 2022, 13673 : 729 - 745
  • [42] Broaden Your Views for Self-Supervised Video Learning
    Recasens, Adria
    Luc, Pauline
    Alayrac, Jean-Baptiste
    Wang, Luyu
    Strub, Florian
    Tallec, Corentin
    Malinowski, Mateusz
    Patraaucean, Viorica
    Altche, Florent
    Valko, Michal
    Grill, Jean-Bastien
    van den Oord, Aaron
    Zisserman, Andrew
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 1235 - 1245
  • [43] Self-Supervised Generation of Spatial Audio for 360° Video
    Morgado, Pedro
    Vasconcelos, Nuno
    Langlois, Timothy
    Wang, Oliver
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
  • [44] Cascaded Siamese Self-supervised Audio to Video GAN
    Aldausari, Nuha
    Sowmya, Arcot
    Marcus, Nadine
    Mohammadi, Gelareh
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2022, 2022, : 4690 - 4699
  • [45] Self-supervised Video Object Segmentation by Motion Grouping
    Yang, Charig
    Lamdouar, Hala
    Lu, Erika
    Zisserman, Andrew
    Xie, Weidi
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 7157 - 7168
  • [46] Self-supervised learning of class embeddings from video
    Wiles, Olivia
    Koepke, A. Sophia
    Zisserman, Andrew
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW), 2019, : 3019 - 3027
  • [47] Video Motion Perception for Self-supervised Representation Learning
    Li, Wei
    Luo, Dezhao
    Fang, Bo
    Li, Xiaoni
    Zhou, Yu
    Wang, Weiping
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2022, PT IV, 2022, 13532 : 508 - 520
  • [48] Self-supervised Moving Vehicle Tracking with Stereo Sound
    Gan, Chuang
    Zhao, Hang
    Chen, Peihao
    Cox, David
    Torralba, Antonio
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 7052 - 7061
  • [49] Self-supervised discriminative model prediction for visual tracking
    Di Yuan
    Gu Geng
    Xiu Shu
    Qiao Liu
    Xiaojun Chang
    Zhenyu He
    Guangming Shi
    Neural Computing and Applications, 2024, 36 : 5153 - 5164
  • [50] Self-Supervised Video-Centralised Transformer for Video Face Clustering
    Wang, Yujiang
    Dong, Mingzhi
    Shen, Jie
    Luo, Yiming
    Lin, Yiming
    Ma, Pingchuan
    Petridis, Stavros
    Pantic, Maja
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (11) : 12944 - 12959