Self-supervised Vision Transformers for Writer Retrieval

被引:0
|
作者
Raven, Tim [1 ]
Matei, Arthur [1 ]
Fink, Gernot A. [1 ]
机构
[1] TU Dortmund Univ, Dortmund, Germany
关键词
Writer Retrieval; Writer Identification; Historical Documents; Self-Supervised Learning; Vision Transformer; IDENTIFICATION; FEATURES; VLAD;
D O I
10.1007/978-3-031-70536-6_23
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
While methods based on Vision Transformers (ViT) have achieved state-of-the-art performance in many domains, they have not yet been applied successfully in the domain of writer retrieval. The field is dominated by methods using handcrafted features or features extracted from Convolutional Neural Networks. In this work, we bridge this gap and present a novel method that extracts features from a ViT and aggregates them using VLAD encoding. The model is trained in a self-supervised fashion without any need for labels. We show that extracting local foreground features is superior to using the ViT's class token in the context of writer retrieval. We evaluate our method on two historical document collections. We set a new state-at-of-art performance on the Historical-WI dataset (83.1% mAP), and the HisIR19 dataset (95.0% mAP). Additionally, we demonstrate that our ViT feature extractor can be directly applied to modern datasets such as the CVL database (98.6% mAP) without any fine-tuning.
引用
收藏
页码:380 / 396
页数:17
相关论文
共 50 条
  • [41] Positional Label for Self-Supervised Vision Transformer
    Zhang, Zhemin
    Gong, Xun
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 3, 2023, : 3516 - 3524
  • [42] A Cross-Domain Threat Screening and Localization Framework Using Vision Transformers and Self-supervised Learning
    Nasim, Ammara
    Akram, Muhammad Usman
    Khan, Asad Mansoor
    Khan, Muhammad Belal Afsar
    Hassan, Taimur
    2024 14TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION SYSTEMS, ICPRS, 2024,
  • [43] Discriminative Sampling of Proposals in Self-Supervised Transformers for Weakly Supervised Object Localization
    Murtaza, Shakeeb
    Belharbi, Soufiane
    Pedersoli, Marco
    Sarraf, Aydin
    Granger, Eric
    2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION WORKSHOPS (WACVW), 2023, : 155 - 165
  • [44] Self-Writer: Clusterable Embedding Based Self-Supervised Writer Recognition from Unlabeled Data
    Mohammad, Zabir
    Kabir, Muhammad Mohsin
    Monowar, Muhammad Mostafa
    Hamid, Md Abdul
    Mridha, Muhammad Firoz
    MATHEMATICS, 2022, 10 (24)
  • [45] Self-Supervised Temporal Sensitive Hashing for Video Retrieval
    Li, Qihua
    Tian, Xing
    Ng, Wing W. Y.
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 9021 - 9035
  • [46] Self-Supervised Graph Convolution for Video Moment Retrieval
    Hu, Xiwen
    Wang, Guolong
    Shan, Shimin
    Liu, Yu
    Li, Jiangquan
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING, ICANN 2023, PART X, 2023, 14263 : 407 - 419
  • [47] On Self-Supervised Learning and Prompt Tuning of Vision Transformers for Cross-sensor Fingerprint Presentation Attack Detection
    Nadeem, Maryam
    Nandakumar, Karthik
    2023 IEEE INTERNATIONAL JOINT CONFERENCE ON BIOMETRICS, IJCB, 2023,
  • [48] Applying masked autoencoder-based self-supervised learning for high-capability vision transformers of electrocardiographies
    Sawano, Shinnosuke
    Kodera, Satoshi
    Setoguchi, Naoto
    Tanabe, Kengo
    Kushida, Shunichi
    Kanda, Junji
    Saji, Mike
    Nanasato, Mamoru
    Maki, Hisataka
    Fujita, Hideo
    Kato, Nahoko
    Watanabe, Hiroyuki
    Suzuki, Minami
    Takahashi, Masao
    Sawada, Naoko
    Yamasaki, Masao
    Sato, Masataka
    Katsushika, Susumu
    Shinohara, Hiroki
    Takeda, Norifumi
    Fujiu, Katsuhito
    Daimon, Masao
    Akazawa, Hiroshi
    Morita, Hiroyuki
    Komuro, Issei
    PLOS ONE, 2024, 19 (08):
  • [49] EslaXDET: A new X-ray baggage security detection framework based on self-supervised vision transformers
    Wu, Jiajie
    Xu, Xianghua
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2024, 127
  • [50] EslaXDET: A new X-ray baggage security detection framework based on self-supervised vision transformers
    Wu, Jiajie
    Xu, Xianghua
    Engineering Applications of Artificial Intelligence, 2024, 127