Self-supervised Vision Transformers for Writer Retrieval

被引：0

作者：

Raven, Tim ^{[1
]}

Matei, Arthur ^{[1
]}

Fink, Gernot A. ^{[1
]}

机构：

[1] TU Dortmund Univ, Dortmund, Germany

来源：

DOCUMENT ANALYSIS AND RECOGNITION-ICDAR 2024, PT II | 2024年 / 14805卷

关键词：

Writer Retrieval; Writer Identification; Historical Documents; Self-Supervised Learning; Vision Transformer; IDENTIFICATION; FEATURES; VLAD;

D O I：

10.1007/978-3-031-70536-6_23

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

While methods based on Vision Transformers (ViT) have achieved state-of-the-art performance in many domains, they have not yet been applied successfully in the domain of writer retrieval. The field is dominated by methods using handcrafted features or features extracted from Convolutional Neural Networks. In this work, we bridge this gap and present a novel method that extracts features from a ViT and aggregates them using VLAD encoding. The model is trained in a self-supervised fashion without any need for labels. We show that extracting local foreground features is superior to using the ViT's class token in the context of writer retrieval. We evaluate our method on two historical document collections. We set a new state-at-of-art performance on the Historical-WI dataset (83.1% mAP), and the HisIR19 dataset (95.0% mAP). Additionally, we demonstrate that our ViT feature extractor can be directly applied to modern datasets such as the CVL database (98.6% mAP) without any fine-tuning.

引用

页码：380 / 396

页数：17

共 50 条

[1] Self-supervised Vision Transformers with Data Augmentation Strategies Using Morphological Operations for Writer Retrieval
Peer, Marco
Kleber, Florian
Sablatnig, Robert
FRONTIERS IN HANDWRITING RECOGNITION, ICFHR 2022, 2022, 13639 : 122 - 136
[2] Self-supervised vision transformers for semantic segmentation
Gu, Xianfan
Hu, Yingdong
Wen, Chuan
Gao, Yang
COMPUTER VISION AND IMAGE UNDERSTANDING, 2025, 251
[3] Self-Supervised Vision Transformers for Malware Detection
Seneviratne, Sachith
Shariffdeen, Ridwan
Rasnayaka, Sanka
Kasthuriarachchi, Nuran
IEEE ACCESS, 2022, 10 : 103121 - 103135
[4] Exploring Self-Supervised Vision Transformers for Gait Recognition in the Wild
Cosma, Adrian
Catruna, Andy
Radoi, Emilian
SENSORS, 2023, 23 (05)
[5] SAGHOG: Self-supervised Autoencoder for Generating HOG Features for Writer Retrieval
Peer, Marco
Kleber, Florian
Sablatnig, Robert
DOCUMENT ANALYSIS AND RECOGNITION-ICDAR 2024, PT II, 2024, 14805 : 121 - 138
[6] Monocular Robot Navigation with Self-Supervised Pretrained Vision Transformers
Saavedra-Ruiz, Miguel
Morin, Sacha
Paull, Liam
2022 19TH CONFERENCE ON ROBOTS AND VISION (CRV 2022), 2022, : 197 - 204
[7] Self-Supervised Augmented Vision Transformers for Remote Physiological Measurement
Pang, Liyu
Li, Xiaoou
Wang, Zhen
Lei, Xueyi
Pei, Yulong
2023 8TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND BIG DATA ANALYTICS, ICCCBDA, 2023, : 623 - 627
[8] SELF-SUPERVISED VISION TRANSFORMERS FOR JOINT SAR-OPTICAL REPRESENTATION LEARNING
Wang, Yi
Albrecht, Conrad M.
Zhu, Xiao Xiang
2022 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM (IGARSS 2022), 2022, : 139 - 142
[9] Learnable Masked Tokens for Improved Transferability of Self-supervised Vision Transformers
Hu, Hao
Baldassarre, Federico
Azizpour, Hossein
MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2022, PT III, 2023, 13715 : 409 - 426
[10] Self-supervised Vision Transformers for 3D pose estimation of novel objects
Thalhammer, Stefan
Weibel, Jean-Baptiste
Vincze, Markus
Garcia-Rodriguez, Jose
IMAGE AND VISION COMPUTING, 2023, 139

← 1 2 3 4 5 →