Self-supervised Vision Transformers for Writer Retrieval

被引：0

作者：

Raven, Tim ^{[1
]}

Matei, Arthur ^{[1
]}

Fink, Gernot A. ^{[1
]}

机构：

[1] TU Dortmund Univ, Dortmund, Germany

来源：

DOCUMENT ANALYSIS AND RECOGNITION-ICDAR 2024, PT II | 2024年 / 14805卷

关键词：

Writer Retrieval; Writer Identification; Historical Documents; Self-Supervised Learning; Vision Transformer; IDENTIFICATION; FEATURES; VLAD;

D O I：

10.1007/978-3-031-70536-6_23

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

While methods based on Vision Transformers (ViT) have achieved state-of-the-art performance in many domains, they have not yet been applied successfully in the domain of writer retrieval. The field is dominated by methods using handcrafted features or features extracted from Convolutional Neural Networks. In this work, we bridge this gap and present a novel method that extracts features from a ViT and aggregates them using VLAD encoding. The model is trained in a self-supervised fashion without any need for labels. We show that extracting local foreground features is superior to using the ViT's class token in the context of writer retrieval. We evaluate our method on two historical document collections. We set a new state-at-of-art performance on the Historical-WI dataset (83.1% mAP), and the HisIR19 dataset (95.0% mAP). Additionally, we demonstrate that our ViT feature extractor can be directly applied to modern datasets such as the CVL database (98.6% mAP) without any fine-tuning.

引用

页码：380 / 396

页数：17

共 50 条

[31] Multi-label remote sensing classification with self-supervised gated multi-modal transformers
Liu, Na
Yuan, Ye
Wu, Guodong
Zhang, Sai
Leng, Jie
Wan, Lihong
FRONTIERS IN COMPUTATIONAL NEUROSCIENCE, 2024, 18
[32] DiPS: Discriminative pseudo-label sampling with self-supervised transformers for weakly supervised object localization
Murtaza, Shakeeb
Belharbi, Soufiane
Pedersoli, Marco
Sarraf, Aydin
Granger, Eric
IMAGE AND VISION COMPUTING, 2023, 140
[33] Self-supervised learning for robust object retrieval without human annotations
Van den Herrewegen, Jarne
Tourwe, Tom
Wyffels, Francis
COMPUTERS & GRAPHICS-UK, 2023, 115 : 13 - 24
[34] Self-supervised deep metric learning for ancient papyrus fragments retrieval
Pirrone, Antoine
Beurton-Aimar, Marie
Journet, Nicholas
INTERNATIONAL JOURNAL ON DOCUMENT ANALYSIS AND RECOGNITION, 2021, 24 (03) : 219 - 234
[35] Self-supervised deep metric learning for ancient papyrus fragments retrieval
Antoine Pirrone
Marie Beurton-Aimar
Nicholas Journet
International Journal on Document Analysis and Recognition (IJDAR), 2021, 24 : 219 - 234
[36] Deep Contrastive Self-Supervised Hashing for Remote Sensing Image Retrieval
Tan, Xiaoyan
Zou, Yun
Guo, Ziyang
Zhou, Ke
Yuan, Qiangqiang
REMOTE SENSING, 2022, 14 (15)
[37] An image retrieval approach based on feature extraction and self-supervised learning
Kolahkaj, Maral
2022 SECOND INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING AND HIGH PERFORMANCE COMPUTING (DCHPC), 2022, : 46 - 51
[38] Self-supervised Vision Transformer are Scalable Generative Models for Domain Generalization
Doerrich, Sebastian
Di Salvo, Francesco
Ledig, Christian
MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2024, PT X, 2024, 15010 : 644 - 654
[39] Self-supervised anomaly detection in computer vision and beyond: A survey and outlook
Hojjati, Hadi
Ho, Thi Kieu Khanh
Armanfard, Narges
NEURAL NETWORKS, 2024, 172
[40] Self-supervised ARTMAP
Amis, Gregory P.
Carpenter, Gail A.
NEURAL NETWORKS, 2010, 23 (02) : 265 - 282

← 1 2 3 4 5 →