Self-supervised Vision Transformers for Writer Retrieval

被引:0
|
作者
Raven, Tim [1 ]
Matei, Arthur [1 ]
Fink, Gernot A. [1 ]
机构
[1] TU Dortmund Univ, Dortmund, Germany
来源
DOCUMENT ANALYSIS AND RECOGNITION-ICDAR 2024, PT II | 2024年 / 14805卷
关键词
Writer Retrieval; Writer Identification; Historical Documents; Self-Supervised Learning; Vision Transformer; IDENTIFICATION; FEATURES; VLAD;
D O I
10.1007/978-3-031-70536-6_23
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
While methods based on Vision Transformers (ViT) have achieved state-of-the-art performance in many domains, they have not yet been applied successfully in the domain of writer retrieval. The field is dominated by methods using handcrafted features or features extracted from Convolutional Neural Networks. In this work, we bridge this gap and present a novel method that extracts features from a ViT and aggregates them using VLAD encoding. The model is trained in a self-supervised fashion without any need for labels. We show that extracting local foreground features is superior to using the ViT's class token in the context of writer retrieval. We evaluate our method on two historical document collections. We set a new state-at-of-art performance on the Historical-WI dataset (83.1% mAP), and the HisIR19 dataset (95.0% mAP). Additionally, we demonstrate that our ViT feature extractor can be directly applied to modern datasets such as the CVL database (98.6% mAP) without any fine-tuning.
引用
收藏
页码:380 / 396
页数:17
相关论文
共 50 条
  • [31] Multi-label remote sensing classification with self-supervised gated multi-modal transformers
    Liu, Na
    Yuan, Ye
    Wu, Guodong
    Zhang, Sai
    Leng, Jie
    Wan, Lihong
    FRONTIERS IN COMPUTATIONAL NEUROSCIENCE, 2024, 18
  • [32] DiPS: Discriminative pseudo-label sampling with self-supervised transformers for weakly supervised object localization
    Murtaza, Shakeeb
    Belharbi, Soufiane
    Pedersoli, Marco
    Sarraf, Aydin
    Granger, Eric
    IMAGE AND VISION COMPUTING, 2023, 140
  • [33] Self-supervised learning for robust object retrieval without human annotations
    Van den Herrewegen, Jarne
    Tourwe, Tom
    Wyffels, Francis
    COMPUTERS & GRAPHICS-UK, 2023, 115 : 13 - 24
  • [34] Self-supervised deep metric learning for ancient papyrus fragments retrieval
    Pirrone, Antoine
    Beurton-Aimar, Marie
    Journet, Nicholas
    INTERNATIONAL JOURNAL ON DOCUMENT ANALYSIS AND RECOGNITION, 2021, 24 (03) : 219 - 234
  • [35] Self-supervised deep metric learning for ancient papyrus fragments retrieval
    Antoine Pirrone
    Marie Beurton-Aimar
    Nicholas Journet
    International Journal on Document Analysis and Recognition (IJDAR), 2021, 24 : 219 - 234
  • [36] Deep Contrastive Self-Supervised Hashing for Remote Sensing Image Retrieval
    Tan, Xiaoyan
    Zou, Yun
    Guo, Ziyang
    Zhou, Ke
    Yuan, Qiangqiang
    REMOTE SENSING, 2022, 14 (15)
  • [37] An image retrieval approach based on feature extraction and self-supervised learning
    Kolahkaj, Maral
    2022 SECOND INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING AND HIGH PERFORMANCE COMPUTING (DCHPC), 2022, : 46 - 51
  • [38] Self-supervised Vision Transformer are Scalable Generative Models for Domain Generalization
    Doerrich, Sebastian
    Di Salvo, Francesco
    Ledig, Christian
    MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2024, PT X, 2024, 15010 : 644 - 654
  • [39] Self-supervised anomaly detection in computer vision and beyond: A survey and outlook
    Hojjati, Hadi
    Ho, Thi Kieu Khanh
    Armanfard, Narges
    NEURAL NETWORKS, 2024, 172
  • [40] Self-supervised ARTMAP
    Amis, Gregory P.
    Carpenter, Gail A.
    NEURAL NETWORKS, 2010, 23 (02) : 265 - 282