Self-supervised Vision Transformers for Writer Retrieval

被引:0
|
作者
Raven, Tim [1 ]
Matei, Arthur [1 ]
Fink, Gernot A. [1 ]
机构
[1] TU Dortmund Univ, Dortmund, Germany
来源
DOCUMENT ANALYSIS AND RECOGNITION-ICDAR 2024, PT II | 2024年 / 14805卷
关键词
Writer Retrieval; Writer Identification; Historical Documents; Self-Supervised Learning; Vision Transformer; IDENTIFICATION; FEATURES; VLAD;
D O I
10.1007/978-3-031-70536-6_23
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
While methods based on Vision Transformers (ViT) have achieved state-of-the-art performance in many domains, they have not yet been applied successfully in the domain of writer retrieval. The field is dominated by methods using handcrafted features or features extracted from Convolutional Neural Networks. In this work, we bridge this gap and present a novel method that extracts features from a ViT and aggregates them using VLAD encoding. The model is trained in a self-supervised fashion without any need for labels. We show that extracting local foreground features is superior to using the ViT's class token in the context of writer retrieval. We evaluate our method on two historical document collections. We set a new state-at-of-art performance on the Historical-WI dataset (83.1% mAP), and the HisIR19 dataset (95.0% mAP). Additionally, we demonstrate that our ViT feature extractor can be directly applied to modern datasets such as the CVL database (98.6% mAP) without any fine-tuning.
引用
收藏
页码:380 / 396
页数:17
相关论文
共 50 条
  • [41] Clinical Outcome Prediction in COVID-19 using Self-supervised Vision Transformer Representations
    Konwer, Aishik
    Prasanna, Prateek
    MEDICAL IMAGING 2022: COMPUTER-AIDED DIAGNOSIS, 2022, 12033
  • [42] Few-shot segmentation for esophageal OCT images based on self-supervised vision transformer
    Wang, Cong
    Gan, Meng
    INTERNATIONAL JOURNAL OF IMAGING SYSTEMS AND TECHNOLOGY, 2024, 34 (02)
  • [43] SELF-SUPERVISED SPEAKER VERIFICATION WITH SIMPLE SIAMESE NETWORK AND SELF-SUPERVISED REGULARIZATION
    Sang, Mufan
    Li, Haoqi
    Liu, Fang
    Arnold, Andrew O.
    Wan, Li
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6127 - 6131
  • [44] ExpPoint-MAE: Better Interpretability and Performance for Self-Supervised Point Cloud Transformers
    Romanelis, Ioannis
    Fotis, Vlassis
    Moustakas, Konstantinos
    Munteanu, Adrian
    IEEE ACCESS, 2024, 12 : 53565 - 53578
  • [45] Self-Supervised Learning with Graph Neural Networks for Region of Interest Retrieval in Histopathology
    Ozen, Yigit
    Aksoy, Selim
    Kosemehmetoglu, Kemal
    Onder, Sevgen
    Uner, Aysegul
    2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 6329 - 6334
  • [46] Self-supervised cross-modal visual retrieval from brain activities
    Ye, Zesheng
    Yao, Lina
    Zhang, Yu
    Gustin, Sylvia
    PATTERN RECOGNITION, 2024, 145
  • [47] Self-supervised Image-based 3D Model Retrieval
    Song, Dan
    Zhang, Chu-Meng
    Zhao, Xiao-Qian
    Wang, Teng
    Nie, Wei-Zhi
    Li, Xuan-Ya
    Liu, An-An
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2023, 19 (02)
  • [48] Exploring the Effect of Dataset Diversity in Self-supervised Learning for Surgical Computer Vision
    Jaspers, Tim J. M.
    de Jonker, Ronald L. P. D.
    Al Khalil, Yasmina
    Zeelenberg, Tijn
    Kusters, Carolus H. J.
    Li, Yiping
    van Jaarsveld, Romy C.
    Bakker, Franciscus H. A.
    Ruurda, Jelle P.
    Brinkman, Willem M.
    De With, Peter H. N.
    van der Sommen, Fons
    DATA ENGINEERING IN MEDICAL IMAGING, DEMI 2024, 2025, 15265 : 43 - 53
  • [49] Online, Self-Supervised Vision-Based Terrain Classification in Unstructured Environments
    Moghadam, Peyman
    Wijesoma, Wijerupage Sardha
    2009 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN AND CYBERNETICS (SMC 2009), VOLS 1-9, 2009, : 3100 - 3105
  • [50] Self-supervised vision transformer-based few-shot learning for facial expression recognition
    Chen, Xuanchi
    Zheng, Xiangwei
    Sun, Kai
    Liu, Weilong
    Zhang, Yuang
    INFORMATION SCIENCES, 2023, 634 : 206 - 226