Self-supervised Vision Transformers for Writer Retrieval

被引:0
|
作者
Raven, Tim [1 ]
Matei, Arthur [1 ]
Fink, Gernot A. [1 ]
机构
[1] TU Dortmund Univ, Dortmund, Germany
来源
DOCUMENT ANALYSIS AND RECOGNITION-ICDAR 2024, PT II | 2024年 / 14805卷
关键词
Writer Retrieval; Writer Identification; Historical Documents; Self-Supervised Learning; Vision Transformer; IDENTIFICATION; FEATURES; VLAD;
D O I
10.1007/978-3-031-70536-6_23
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
While methods based on Vision Transformers (ViT) have achieved state-of-the-art performance in many domains, they have not yet been applied successfully in the domain of writer retrieval. The field is dominated by methods using handcrafted features or features extracted from Convolutional Neural Networks. In this work, we bridge this gap and present a novel method that extracts features from a ViT and aggregates them using VLAD encoding. The model is trained in a self-supervised fashion without any need for labels. We show that extracting local foreground features is superior to using the ViT's class token in the context of writer retrieval. We evaluate our method on two historical document collections. We set a new state-at-of-art performance on the Historical-WI dataset (83.1% mAP), and the HisIR19 dataset (95.0% mAP). Additionally, we demonstrate that our ViT feature extractor can be directly applied to modern datasets such as the CVL database (98.6% mAP) without any fine-tuning.
引用
收藏
页码:380 / 396
页数:17
相关论文
共 50 条
  • [21] Enhanced Industrial Action Recognition Through Self-Supervised Visual Transformers
    Xiao, Yao
    Xiang, Hua
    Wang, Tongxi
    Wang, Yiju
    IEEE ACCESS, 2024, 12 : 134133 - 134143
  • [22] Self-Supervised Text Style Transfer with Rationale Prediction and Pretrained Transformers
    Sinclair, Neil
    Buys, Jan
    ARTIFICIAL INTELLIGENCE RESEARCH, SACAIR 2022, 2022, 1734 : 291 - 305
  • [23] Transformers in Self-Supervised Monocular Depth Estimation with Unknown Camera Intrinsics
    Varma, Arnav
    Chawla, Hemang
    Zonooz, Bahram
    Arani, Elahe
    PROCEEDINGS OF THE 17TH INTERNATIONAL JOINT CONFERENCE ON COMPUTER VISION, IMAGING AND COMPUTER GRAPHICS THEORY AND APPLICATIONS (VISAPP), VOL 4, 2022, : 758 - 769
  • [24] Writer Retrieval using Compact Convolutional Transformers and NetMVLAD
    Peer, Marco
    Kleber, Florian
    Sablatnig, Robert
    2022 26TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2022, : 1571 - 1578
  • [25] Self-Supervised Pretraining of Transformers for Satellite Image Time Series Classification
    Yuan, Yuan
    Lin, Lei
    IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2021, 14 : 474 - 487
  • [26] Iterative Self-Supervised Learning for Legal Similar Case Retrieval
    Liu, Yao
    Tan, Tien-Ping
    Zhan, Xiaoping
    IEEE ACCESS, 2024, 12 : 17231 - 17241
  • [27] Self-Supervised Visual Representations for Cross-Modal Retrieval
    Patel, Yash
    Gomez, Lluis
    Rusinol, Marcal
    Karatzas, Dimosthenis
    Jawahar, C., V
    ICMR'19: PROCEEDINGS OF THE 2019 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, 2019, : 182 - 186
  • [28] Dissecting self-supervised learning methods for surgical computer vision
    Ramesh, Sanat
    Srivastav, Vinkle
    Alapatt, Deepak
    Yu, Tong
    Murali, Aditya
    Sestini, Luca
    Nwoye, Chinedu Innocent
    Hamoud, Idris
    Sharma, Saurav
    Fleurentin, Antoine
    Exarchakis, Georgios
    Karargyris, Alexandros
    Padoy, Nicolas
    MEDICAL IMAGE ANALYSIS, 2023, 88
  • [29] MS-DINO: Masked Self-Supervised Distributed Learning Using Vision Transformer
    Park, Sangjoon
    Lee, Ik Jae
    Kim, Jun Won
    Ye, Jong Chul
    IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2024, 28 (10) : 6180 - 6192
  • [30] COMPARATIVE ANALYSIS OF SELF-SUPERVISED PRE-TRAINED VISION TRANSFORMERS AND CONVOLUTIONAL NEURAL NETWORKS WITH CHEXNET IN CLASSIFYING LUNG CONDITIONS
    Elwirehardja, Gregorius natanael
    Liem, Steve marcello
    Adjie, Maria linneke
    Tjan, Farrel alexander
    Setiawan, Joselyn
    Syahputra, Muhammad edo
    Muljo, Hery harjono
    COMMUNICATIONS IN MATHEMATICAL BIOLOGY AND NEUROSCIENCE, 2025,