TextAdapter: Self-Supervised Domain Adaptation for Cross-Domain Text Recognition

被引:1
作者
Liu, Xiao-Qian [1 ]
Zhang, Peng-Fei [2 ]
Luo, Xin [1 ]
Huang, Zi [2 ]
Xu, Xin-Shun [1 ]
机构
[1] Shandong Univ, Sch Software, Jinan 250101, Peoples R China
[2] Univ Queensland, Sch Elect Engn & Comp Sci, Brisbane, Qld 4072, Australia
基金
中国国家自然科学基金;
关键词
Text recognition; Decoding; Semantics; Prototypes; Adaptation models; Task analysis; Data models; Self-supervised learning; contrastive learning; consistency regularization; domain adaptation; text recognition;
D O I
10.1109/TMM.2024.3400669
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Text recognition remains challenging, primarily due to the scarcity of annotated real data or the hard labor to annotate large-scale real data. Most existing solutions rely on synthetic training data, where the synthetic-to-real domain gaps limit the model performance on real data. Unsupervised domain adaptation (UDA) methods have been proposed, aiming to obtain domain-invariant representations. However, they commonly focus on domain-level alignment, neglecting the fine-grained character features and thus leading to indistinguishable characters. In this paper, we propose a simple yet effective self-supervised UDA framework tailored for cross-domain text recognition, named TextAdapter, which integrates contrastive learning and consistency regularization to mitigate domain gaps. Specifically, a fine-grained feature alignment module based on character contrastive learning is designed to learn domain-invariant character representations by category-level alignment. Additionally, to address the task-agnostic problem in contrastive learning, i.e., ignoring the sequence semantics, an instance consistency matching module is proposed to perceive the contextual semantics by matching the prediction consistency among target data different augmented views. Experimental results on cross-domain benchmarks demonstrate the effectiveness of our method. Furthermore, TextAdapter can be embedded in most off-the-shelf text recognition models with new state-of-the-art performance, which illustrates the generality of our framework.
引用
收藏
页码:9854 / 9865
页数:12
相关论文
共 57 条
[11]   Informative Feature Disentanglement for Unsupervised Domain Adaptation [J].
Deng, Wanxia ;
Zhao, Lingjun ;
Liao, Qing ;
Guo, Deke ;
Kuang, Gangyao ;
Hu, Dewen ;
Pietikainen, Matti ;
Liu, Li .
IEEE TRANSACTIONS ON MULTIMEDIA, 2022, 24 :2407-2421
[12]   DOC: Text Recognition via Dual Adaptation and Clustering [J].
Ding, Xue-Ying ;
Liu, Xiao-Qian ;
Luo, Xin ;
Xu, Xin-Shun .
IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 :9071-9081
[13]   Read Like Humans: Autonomous, Bidirectional and Iterative Language Modeling for Scene Text Recognition [J].
Fang, Shancheng ;
Xie, Hongtao ;
Wang, Yuxin ;
Mao, Zhendong ;
Zhang, Yongdong .
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :7094-7103
[14]  
Gretton A., 2006, ADV NEURAL INFORM PR, V19, P513
[15]   Synthetic Data for Text Localisation in Natural Images [J].
Gupta, Ankush ;
Vedaldi, Andrea ;
Zisserman, Andrew .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :2315-2324
[16]  
He Y, 2022, AAAI CONF ARTIF INTE, P888
[17]  
Jaderberg M, 2014, Arxiv, DOI [arXiv:1406.2227, DOI 10.48550/ARXIV.1406.2227]
[18]   Text Recognition in Real Scenarios with a Few Labeled Samples [J].
Lin, Jinghuang ;
Cheng, Zhanzhan ;
Rai, Fan ;
Niu, Yi ;
Pu, Shiliang ;
Zhou, Shuigeng .
2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, :370-377
[19]  
Karatzas D, 2015, PROC INT CONF DOC, P1156, DOI 10.1109/ICDAR.2015.7333942
[20]   ICDAR 2013 Robust Reading Competition [J].
Karatzas, Dimosthenis ;
Shafait, Faisal ;
Uchida, Seiichi ;
Iwamura, Masakazu ;
Gomez i Bigorda, Lluis ;
Robles Mestre, Sergi ;
Mas, Joan ;
Fernandez Mota, David ;
Almazan Almazan, Jon ;
Pere de las Heras, Lluis .
2013 12TH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), 2013, :1484-1493