Self-supervised Vision Transformers for Writer Retrieval

被引：0

作者：

Raven, Tim ^{[1
]}

Matei, Arthur ^{[1
]}

Fink, Gernot A. ^{[1
]}

机构：

[1] TU Dortmund Univ, Dortmund, Germany

来源：

DOCUMENT ANALYSIS AND RECOGNITION-ICDAR 2024, PT II | 2024年 / 14805卷

关键词：

Writer Retrieval; Writer Identification; Historical Documents; Self-Supervised Learning; Vision Transformer; IDENTIFICATION; FEATURES; VLAD;

D O I：

10.1007/978-3-031-70536-6_23

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

While methods based on Vision Transformers (ViT) have achieved state-of-the-art performance in many domains, they have not yet been applied successfully in the domain of writer retrieval. The field is dominated by methods using handcrafted features or features extracted from Convolutional Neural Networks. In this work, we bridge this gap and present a novel method that extracts features from a ViT and aggregates them using VLAD encoding. The model is trained in a self-supervised fashion without any need for labels. We show that extracting local foreground features is superior to using the ViT's class token in the context of writer retrieval. We evaluate our method on two historical document collections. We set a new state-at-of-art performance on the Historical-WI dataset (83.1% mAP), and the HisIR19 dataset (95.0% mAP). Additionally, we demonstrate that our ViT feature extractor can be directly applied to modern datasets such as the CVL database (98.6% mAP) without any fine-tuning.

引用

页码：380 / 396

页数：17

共 50 条

[11] Gait Recognition with Self-Supervised Learning of Gait Features Based on Vision Transformers
Pincic, Domagoj
Susanj, Diego
Lenac, Kristijan
SENSORS, 2022, 22 (19)
[12] A Cross-Domain Threat Screening and Localization Framework Using Vision Transformers and Self-supervised Learning
Nasim, Ammara
Akram, Muhammad Usman
Khan, Asad Mansoor
Khan, Muhammad Belal Afsar
Hassan, Taimur
2024 14TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION SYSTEMS, ICPRS, 2024,
[13] Writer Identification and Writer Retrieval Using Vision Transformer for Forensic Documents
Koepf, Michael
Kleber, Florian
Sablatnig, Robert
DOCUMENT ANALYSIS SYSTEMS, DAS 2022, 2022, 13237 : 352 - 366
[14] Self-supervised learning of Vision Transformers for digital soil mapping using visual data
Tresson, Paul
Dumont, Maxime
Jaeger, Marc
Borne, Frederic
Boivin, Stephane
Marie-Louise, Loic
Francois, Jeremie
Boukcim, Hassan
Goeau, Herve
GEODERMA, 2024, 450
[15] Self-Writer: Clusterable Embedding Based Self-Supervised Writer Recognition from Unlabeled Data
Mohammad, Zabir
Kabir, Muhammad Mohsin
Monowar, Muhammad Mostafa
Hamid, Md Abdul
Mridha, Muhammad Firoz
MATHEMATICS, 2022, 10 (24)
[16] Understanding Self-Attention of Self-Supervised Audio Transformers
Yang, Shu-wen
Liu, Andy T.
Lee, Hung-yi
INTERSPEECH 2020, 2020, : 3785 - 3789
[17] The Retrieval of the Beautiful: Self-Supervised Salient Object Detection for Beauty Product Retrieval
Wang, Jiawei
Zhu, Shuai
Xu, Jiao
Cao, Da
PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA (MM'19), 2019, : 2548 - 2552
[18] SELF-SUPERVISED REMOTE SENSING IMAGE RETRIEVAL
Walter, Kane
Gibson, Matthew J.
Sowmya, Arcot
IGARSS 2020 - 2020 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, 2020, : 1683 - 1686
[19] EslaXDET: A new X-ray baggage security detection framework based on self-supervised vision transformers
Wu, Jiajie
Xu, Xianghua
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2024, 127
[20] A Hierarchical Vision Transformer Using Overlapping Patch and Self-Supervised Learning
Ma, Yaxin
Li, Ming
Chang, Jun
2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,

← 1 2 3 4 5 →