Self-supervised Vision Transformers for Writer Retrieval

被引：0

作者：

Raven, Tim ^{[1
]}

Matei, Arthur ^{[1
]}

Fink, Gernot A. ^{[1
]}

机构：

[1] TU Dortmund Univ, Dortmund, Germany

来源：

DOCUMENT ANALYSIS AND RECOGNITION-ICDAR 2024, PT II | 2024年 / 14805卷

关键词：

Writer Retrieval; Writer Identification; Historical Documents; Self-Supervised Learning; Vision Transformer; IDENTIFICATION; FEATURES; VLAD;

D O I：

10.1007/978-3-031-70536-6_23

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

While methods based on Vision Transformers (ViT) have achieved state-of-the-art performance in many domains, they have not yet been applied successfully in the domain of writer retrieval. The field is dominated by methods using handcrafted features or features extracted from Convolutional Neural Networks. In this work, we bridge this gap and present a novel method that extracts features from a ViT and aggregates them using VLAD encoding. The model is trained in a self-supervised fashion without any need for labels. We show that extracting local foreground features is superior to using the ViT's class token in the context of writer retrieval. We evaluate our method on two historical document collections. We set a new state-at-of-art performance on the Historical-WI dataset (83.1% mAP), and the HisIR19 dataset (95.0% mAP). Additionally, we demonstrate that our ViT feature extractor can be directly applied to modern datasets such as the CVL database (98.6% mAP) without any fine-tuning.

引用

页码：380 / 396

页数：17

共 50 条

[21] Enhanced Industrial Action Recognition Through Self-Supervised Visual Transformers
Xiao, Yao
Xiang, Hua
Wang, Tongxi
Wang, Yiju
IEEE ACCESS, 2024, 12 : 134133 - 134143
[22] Self-Supervised Text Style Transfer with Rationale Prediction and Pretrained Transformers
Sinclair, Neil
Buys, Jan
ARTIFICIAL INTELLIGENCE RESEARCH, SACAIR 2022, 2022, 1734 : 291 - 305
[23] Transformers in Self-Supervised Monocular Depth Estimation with Unknown Camera Intrinsics
Varma, Arnav
Chawla, Hemang
Zonooz, Bahram
Arani, Elahe
PROCEEDINGS OF THE 17TH INTERNATIONAL JOINT CONFERENCE ON COMPUTER VISION, IMAGING AND COMPUTER GRAPHICS THEORY AND APPLICATIONS (VISAPP), VOL 4, 2022, : 758 - 769
[24] Writer Retrieval using Compact Convolutional Transformers and NetMVLAD
Peer, Marco
Kleber, Florian
Sablatnig, Robert
2022 26TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2022, : 1571 - 1578
[25] Self-Supervised Pretraining of Transformers for Satellite Image Time Series Classification
Yuan, Yuan
Lin, Lei
IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2021, 14 : 474 - 487
[26] Iterative Self-Supervised Learning for Legal Similar Case Retrieval
Liu, Yao
Tan, Tien-Ping
Zhan, Xiaoping
IEEE ACCESS, 2024, 12 : 17231 - 17241
[27] Self-Supervised Visual Representations for Cross-Modal Retrieval
Patel, Yash
Gomez, Lluis
Rusinol, Marcal
Karatzas, Dimosthenis
Jawahar, C., V
ICMR'19: PROCEEDINGS OF THE 2019 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, 2019, : 182 - 186
[28] Dissecting self-supervised learning methods for surgical computer vision
Ramesh, Sanat
Srivastav, Vinkle
Alapatt, Deepak
Yu, Tong
Murali, Aditya
Sestini, Luca
Nwoye, Chinedu Innocent
Hamoud, Idris
Sharma, Saurav
Fleurentin, Antoine
Exarchakis, Georgios
Karargyris, Alexandros
Padoy, Nicolas
MEDICAL IMAGE ANALYSIS, 2023, 88
[29] MS-DINO: Masked Self-Supervised Distributed Learning Using Vision Transformer
Park, Sangjoon
Lee, Ik Jae
Kim, Jun Won
Ye, Jong Chul
IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2024, 28 (10) : 6180 - 6192
[30] COMPARATIVE ANALYSIS OF SELF-SUPERVISED PRE-TRAINED VISION TRANSFORMERS AND CONVOLUTIONAL NEURAL NETWORKS WITH CHEXNET IN CLASSIFYING LUNG CONDITIONS
Elwirehardja, Gregorius natanael
Liem, Steve marcello
Adjie, Maria linneke
Tjan, Farrel alexander
Setiawan, Joselyn
Syahputra, Muhammad edo
Muljo, Hery harjono
COMMUNICATIONS IN MATHEMATICAL BIOLOGY AND NEUROSCIENCE, 2025,

← 1 2 3 4 5 →