Self-supervised Vision Transformers for Writer Retrieval

被引：0

作者：

Raven, Tim ^{[1
]}

Matei, Arthur ^{[1
]}

Fink, Gernot A. ^{[1
]}

机构：

[1] TU Dortmund Univ, Dortmund, Germany

来源：

DOCUMENT ANALYSIS AND RECOGNITION-ICDAR 2024, PT II | 2024年 / 14805卷

关键词：

Writer Retrieval; Writer Identification; Historical Documents; Self-Supervised Learning; Vision Transformer; IDENTIFICATION; FEATURES; VLAD;

D O I：

10.1007/978-3-031-70536-6_23

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

While methods based on Vision Transformers (ViT) have achieved state-of-the-art performance in many domains, they have not yet been applied successfully in the domain of writer retrieval. The field is dominated by methods using handcrafted features or features extracted from Convolutional Neural Networks. In this work, we bridge this gap and present a novel method that extracts features from a ViT and aggregates them using VLAD encoding. The model is trained in a self-supervised fashion without any need for labels. We show that extracting local foreground features is superior to using the ViT's class token in the context of writer retrieval. We evaluate our method on two historical document collections. We set a new state-at-of-art performance on the Historical-WI dataset (83.1% mAP), and the HisIR19 dataset (95.0% mAP). Additionally, we demonstrate that our ViT feature extractor can be directly applied to modern datasets such as the CVL database (98.6% mAP) without any fine-tuning.

引用

页码：380 / 396

页数：17

共 50 条

[41] Clinical Outcome Prediction in COVID-19 using Self-supervised Vision Transformer Representations
Konwer, Aishik
Prasanna, Prateek
MEDICAL IMAGING 2022: COMPUTER-AIDED DIAGNOSIS, 2022, 12033
[42] Few-shot segmentation for esophageal OCT images based on self-supervised vision transformer
Wang, Cong
Gan, Meng
INTERNATIONAL JOURNAL OF IMAGING SYSTEMS AND TECHNOLOGY, 2024, 34 (02)
[43] SELF-SUPERVISED SPEAKER VERIFICATION WITH SIMPLE SIAMESE NETWORK AND SELF-SUPERVISED REGULARIZATION
Sang, Mufan
Li, Haoqi
Liu, Fang
Arnold, Andrew O.
Wan, Li
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6127 - 6131
[44] ExpPoint-MAE: Better Interpretability and Performance for Self-Supervised Point Cloud Transformers
Romanelis, Ioannis
Fotis, Vlassis
Moustakas, Konstantinos
Munteanu, Adrian
IEEE ACCESS, 2024, 12 : 53565 - 53578
[45] Self-Supervised Learning with Graph Neural Networks for Region of Interest Retrieval in Histopathology
Ozen, Yigit
Aksoy, Selim
Kosemehmetoglu, Kemal
Onder, Sevgen
Uner, Aysegul
2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 6329 - 6334
[46] Self-supervised cross-modal visual retrieval from brain activities
Ye, Zesheng
Yao, Lina
Zhang, Yu
Gustin, Sylvia
PATTERN RECOGNITION, 2024, 145
[47] Self-supervised Image-based 3D Model Retrieval
Song, Dan
Zhang, Chu-Meng
Zhao, Xiao-Qian
Wang, Teng
Nie, Wei-Zhi
Li, Xuan-Ya
Liu, An-An
ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2023, 19 (02)
[48] Exploring the Effect of Dataset Diversity in Self-supervised Learning for Surgical Computer Vision
Jaspers, Tim J. M.
de Jonker, Ronald L. P. D.
Al Khalil, Yasmina
Zeelenberg, Tijn
Kusters, Carolus H. J.
Li, Yiping
van Jaarsveld, Romy C.
Bakker, Franciscus H. A.
Ruurda, Jelle P.
Brinkman, Willem M.
De With, Peter H. N.
van der Sommen, Fons
DATA ENGINEERING IN MEDICAL IMAGING, DEMI 2024, 2025, 15265 : 43 - 53
[49] Online, Self-Supervised Vision-Based Terrain Classification in Unstructured Environments
Moghadam, Peyman
Wijesoma, Wijerupage Sardha
2009 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN AND CYBERNETICS (SMC 2009), VOLS 1-9, 2009, : 3100 - 3105
[50] Self-supervised vision transformer-based few-shot learning for facial expression recognition
Chen, Xuanchi
Zheng, Xiangwei
Sun, Kai
Liu, Weilong
Zhang, Yuang
INFORMATION SCIENCES, 2023, 634 : 206 - 226

← 1 2 3 4 5 →