SAGHOG: Self-supervised Autoencoder for Generating HOG Features for Writer Retrieval

被引：0

作者：

Peer, Marco ^{[1
]}

Kleber, Florian ^{[1
]}

Sablatnig, Robert ^{[1
]}

机构：

[1] TU Wien, Comp Vis Lab, Vienna, Austria

来源：

DOCUMENT ANALYSIS AND RECOGNITION-ICDAR 2024, PT II | 2024年 / 14805卷

关键词：

Writer Retrieval; Self-Supervised Learning; Masked Autoencoder; Document Analysis;

D O I：

10.1007/978-3-031-70536-6_8

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This paper introduces Saghog, a self-supervised pretraining strategy for writer retrieval using HOG features of the binarized input image. Our preprocessing involves the application of the Segment Anything technique to extract handwriting from various datasets, ending up with about 24k documents, followed by training a vision transformer on reconstructing masked patches of the handwriting. Saghog is then finetuned by appending NetRVLAD as an encoding layer to the pretrained encoder. Evaluation of our approach on three historical datasets, Historical-WI, HisFrag20, and GRK-Papyri, demonstrates the effectiveness of Saghog for writer retrieval. Additionally, we provide ablation studies on our architecture and evaluate un- and supervised finetuning. Notably, on HisFrag20, Saghog outperforms related work with a mAP of 57.2% - a margin of 11.6% to the current state of the art, showcasing its robustness on challenging data, and is competitive on even small datasets, e.g. GRK-Papyri, where we achieve a Top-1 accuracy of 58.0%.

引用

页码：121 / 138

页数：18

共 50 条

[1] Self-supervised Vision Transformers for Writer Retrieval
Raven, Tim
Matei, Arthur
Fink, Gernot A.
DOCUMENT ANALYSIS AND RECOGNITION-ICDAR 2024, PT II, 2024, 14805 : 380 - 396
[2] Autoencoder-based self-supervised hashing for cross-modal retrieval
Li, Yifan
Wang, Xuan
Cui, Lei
Zhang, Jiajia
Huang, Chengkai
Luo, Xuan
Qi, Shuhan
MULTIMEDIA TOOLS AND APPLICATIONS, 2021, 80 (11) : 17257 - 17274
[3] Autoencoder-based self-supervised hashing for cross-modal retrieval
Yifan Li
Xuan Wang
Lei Cui
Jiajia Zhang
Chengkai Huang
Xuan Luo
Shuhan Qi
Multimedia Tools and Applications, 2021, 80 : 17257 - 17274
[4] Feature Extraction using Self-Supervised Convolutional Autoencoder for Content based Image Retrieval
Siradjuddin, Indah Agustien
Wardana, Wrida Adi
Sophan, Mochammad Kautsar
2019 3RD INTERNATIONAL CONFERENCE ON INFORMATICS AND COMPUTATIONAL SCIENCES (ICICOS 2019), 2019,
[5] Context Autoencoder for Self-supervised Representation Learning
Xiaokang Chen
Mingyu Ding
Xiaodi Wang
Ying Xin
Shentong Mo
Yunhao Wang
Shumin Han
Ping Luo
Gang Zeng
Jingdong Wang
International Journal of Computer Vision, 2024, 132 : 208 - 223
[6] Context Autoencoder for Self-supervised Representation Learning
Chen, Xiaokang
Ding, Mingyu
Wang, Xiaodi
Xin, Ying
Mo, Shentong
Wang, Yunhao
Han, Shumin
Luo, Ping
Zeng, Gang
Wang, Jingdong
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2023, 132 (1) : 208 - 223
[7] Self-supervised Variational Autoencoder for Recommender Systems
Wang, Jing
Liu, Gangdu
Wu, Jun
Jia, Caiyan
Zhang, Zhifei
2021 IEEE 33RD INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI 2021), 2021, : 831 - 835
[8] A Self-supervised Graph Autoencoder with Barlow Twins
Li, Jingci
Lu, Guangquan
Li, Jiecheng
PRICAI 2022: TRENDS IN ARTIFICIAL INTELLIGENCE, PT II, 2022, 13630 : 501 - 512
[9] Self-supervised Vision Transformers with Data Augmentation Strategies Using Morphological Operations for Writer Retrieval
Peer, Marco
Kleber, Florian
Sablatnig, Robert
FRONTIERS IN HANDWRITING RECOGNITION, ICFHR 2022, 2022, 13639 : 122 - 136
[10] Mixed Autoencoder for Self-supervised Visual Representation Learning
Chen, Kai
Liu, Zhili
Hong, Lanqing
Xu, Hang
Li, Zhenguo
Yeung, Dit-Yan
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 22742 - 22751

← 1 2 3 4 5 →