TextAdapter: Self-Supervised Domain Adaptation for Cross-Domain Text Recognition

被引：1

作者：

Liu, Xiao-Qian ^{[1
]}

Zhang, Peng-Fei ^{[2
]}

Luo, Xin ^{[1
]}

Huang, Zi ^{[2
]}

Xu, Xin-Shun ^{[1
]}

机构：

[1] Shandong Univ, Sch Software, Jinan 250101, Peoples R China

[2] Univ Queensland, Sch Elect Engn & Comp Sci, Brisbane, Qld 4072, Australia

来源：

IEEE TRANSACTIONS ON MULTIMEDIA | 2024年 / 26卷

基金：

中国国家自然科学基金;

关键词：

Text recognition; Decoding; Semantics; Prototypes; Adaptation models; Task analysis; Data models; Self-supervised learning; contrastive learning; consistency regularization; domain adaptation; text recognition;

D O I：

10.1109/TMM.2024.3400669

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Text recognition remains challenging, primarily due to the scarcity of annotated real data or the hard labor to annotate large-scale real data. Most existing solutions rely on synthetic training data, where the synthetic-to-real domain gaps limit the model performance on real data. Unsupervised domain adaptation (UDA) methods have been proposed, aiming to obtain domain-invariant representations. However, they commonly focus on domain-level alignment, neglecting the fine-grained character features and thus leading to indistinguishable characters. In this paper, we propose a simple yet effective self-supervised UDA framework tailored for cross-domain text recognition, named TextAdapter, which integrates contrastive learning and consistency regularization to mitigate domain gaps. Specifically, a fine-grained feature alignment module based on character contrastive learning is designed to learn domain-invariant character representations by category-level alignment. Additionally, to address the task-agnostic problem in contrastive learning, i.e., ignoring the sequence semantics, an instance consistency matching module is proposed to perceive the contextual semantics by matching the prediction consistency among target data different augmented views. Experimental results on cross-domain benchmarks demonstrate the effectiveness of our method. Furthermore, TextAdapter can be embedded in most off-the-shelf text recognition models with new state-of-the-art performance, which illustrates the generality of our framework.

引用

页码：9854 / 9865

页数：12

共 57 条

[1]

Aberdam A, 2022, Arxiv, DOI arXiv:2205.03873

[2] Sequence-to-Sequence Contrastive Learning for Text Recognition [J].

Aberdam, Aviad ;

Litman, Ron ;

Tsiper, Shahar ;

Anschel, Oron ;

Slossberg, Ron ;

Mazor, Shai ;

Manmatha, R. ;

Perona, Pietro .

2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :15297-15307

[3] Multi-Content GAN for Few-Shot Font Style Transfer [J].

Azadil, Samaneh ;

Fisher, Matthew ;

Kim, Vladimir ;

Wang, Zhaowen ;

Shechtman, Eli ;

Darrell, Trevor .

2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :7564-7573

[4] What If We Only Use Real Datasets for Scene Text Recognition? Toward Scene Text Recognition With Fewer Labels [J].

Baek, Jeonghun ;

Matsui, Yusuke ;

Aizawa, Kiyoharu .

2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :3112-3121

[5] What Is Wrong With Scene Text Recognition Model Comparisons? Dataset and Model Analysis [J].

Baek, Jeonghun ;

Kim, Geewook ;

Lee, Junyeop ;

Park, Sungrae ;

Han, Dongyoon ;

Yun, Sangdoo ;

Oh, Seong Joon ;

Lee, Hwalsuk .

2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :4714-4722

[6] Joint Visual Semantic Reasoning: Multi-Stage Decoder for Text Recognition [J].

Bhunia, Ayan Kumar ;

Sain, Aneeshan ;

Kumar, Amandeep ;

Ghose, Shuvozit ;

Chowdhury, Pinaki Nath ;

Song, Yi-Zhe .

2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :14920-14929

[7] MetaHTR: Towards Writer-Adaptive Handwritten Text Recognition [J].

Bhunia, Ayan Kumar ;

Ghose, Shuvozit ;

Kumar, Amandeep ;

Chowdhury, Pinaki Nath ;

Sain, Aneeshan ;

Song, Yi-Zhe .

2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :15825-15834

[8] Unsupervised Pixel-Level Domain Adaptation with Generative Adversarial Networks [J].

Bousmalis, Konstantinos ;

Silberman, Nathan ;

Dohan, David ;

Erhan, Dumitru ;

Krishnan, Dilip .

30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :95-104

[9] SMILE: SEQUENCE-TO-SEQUENCE DOMAIN ADAPTATION WITH MINIMIZING LATENT ENTROPY FOR TEXT IMAGE RECOGNITION [J].

Chang, Yen-Cheng ;

Chen, Yi-Chang ;

Chang, Yu-Chuan ;

Yeh, Yi-Ren .

2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2022, :431-435

[10] Focusing Attention: Towards Accurate Text Recognition in Natural Images [J].

Cheng, Zhanzhan ;

Bai, Fan ;

Xu, Yunlu ;

Zheng, Gang ;

Pu, Shiliang ;

Zhou, Shuigeng .

2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :5086-5094

← 1 2 3 4 5 6 →