Self-supervised Vision Transformers for Writer Retrieval

被引:0
|
作者
Raven, Tim [1 ]
Matei, Arthur [1 ]
Fink, Gernot A. [1 ]
机构
[1] TU Dortmund Univ, Dortmund, Germany
来源
DOCUMENT ANALYSIS AND RECOGNITION-ICDAR 2024, PT II | 2024年 / 14805卷
关键词
Writer Retrieval; Writer Identification; Historical Documents; Self-Supervised Learning; Vision Transformer; IDENTIFICATION; FEATURES; VLAD;
D O I
10.1007/978-3-031-70536-6_23
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
While methods based on Vision Transformers (ViT) have achieved state-of-the-art performance in many domains, they have not yet been applied successfully in the domain of writer retrieval. The field is dominated by methods using handcrafted features or features extracted from Convolutional Neural Networks. In this work, we bridge this gap and present a novel method that extracts features from a ViT and aggregates them using VLAD encoding. The model is trained in a self-supervised fashion without any need for labels. We show that extracting local foreground features is superior to using the ViT's class token in the context of writer retrieval. We evaluate our method on two historical document collections. We set a new state-at-of-art performance on the Historical-WI dataset (83.1% mAP), and the HisIR19 dataset (95.0% mAP). Additionally, we demonstrate that our ViT feature extractor can be directly applied to modern datasets such as the CVL database (98.6% mAP) without any fine-tuning.
引用
收藏
页码:380 / 396
页数:17
相关论文
共 50 条
  • [11] Gait Recognition with Self-Supervised Learning of Gait Features Based on Vision Transformers
    Pincic, Domagoj
    Susanj, Diego
    Lenac, Kristijan
    SENSORS, 2022, 22 (19)
  • [12] A Cross-Domain Threat Screening and Localization Framework Using Vision Transformers and Self-supervised Learning
    Nasim, Ammara
    Akram, Muhammad Usman
    Khan, Asad Mansoor
    Khan, Muhammad Belal Afsar
    Hassan, Taimur
    2024 14TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION SYSTEMS, ICPRS, 2024,
  • [13] Writer Identification and Writer Retrieval Using Vision Transformer for Forensic Documents
    Koepf, Michael
    Kleber, Florian
    Sablatnig, Robert
    DOCUMENT ANALYSIS SYSTEMS, DAS 2022, 2022, 13237 : 352 - 366
  • [14] Self-supervised learning of Vision Transformers for digital soil mapping using visual data
    Tresson, Paul
    Dumont, Maxime
    Jaeger, Marc
    Borne, Frederic
    Boivin, Stephane
    Marie-Louise, Loic
    Francois, Jeremie
    Boukcim, Hassan
    Goeau, Herve
    GEODERMA, 2024, 450
  • [15] Self-Writer: Clusterable Embedding Based Self-Supervised Writer Recognition from Unlabeled Data
    Mohammad, Zabir
    Kabir, Muhammad Mohsin
    Monowar, Muhammad Mostafa
    Hamid, Md Abdul
    Mridha, Muhammad Firoz
    MATHEMATICS, 2022, 10 (24)
  • [16] Understanding Self-Attention of Self-Supervised Audio Transformers
    Yang, Shu-wen
    Liu, Andy T.
    Lee, Hung-yi
    INTERSPEECH 2020, 2020, : 3785 - 3789
  • [17] The Retrieval of the Beautiful: Self-Supervised Salient Object Detection for Beauty Product Retrieval
    Wang, Jiawei
    Zhu, Shuai
    Xu, Jiao
    Cao, Da
    PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA (MM'19), 2019, : 2548 - 2552
  • [18] SELF-SUPERVISED REMOTE SENSING IMAGE RETRIEVAL
    Walter, Kane
    Gibson, Matthew J.
    Sowmya, Arcot
    IGARSS 2020 - 2020 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, 2020, : 1683 - 1686
  • [19] EslaXDET: A new X-ray baggage security detection framework based on self-supervised vision transformers
    Wu, Jiajie
    Xu, Xianghua
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2024, 127
  • [20] A Hierarchical Vision Transformer Using Overlapping Patch and Self-Supervised Learning
    Ma, Yaxin
    Li, Ming
    Chang, Jun
    2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,