DOC: Text Recognition via Dual Adaptation and Clustering

被引:7
作者
Ding, Xue-Ying [1 ]
Liu, Xiao-Qian [1 ]
Luo, Xin [1 ]
Xu, Xin-Shun [1 ]
机构
[1] Shandong Univ, Sch Software, Jinan 250101, Peoples R China
基金
中国国家自然科学基金;
关键词
Feature extraction; Adaptation models; Text recognition; Task analysis; Image recognition; Training; Data models; unsupervised domain adaptation; domain shift; clustering; NETWORK;
D O I
10.1109/TMM.2023.3245404
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
More recently, unsupervised domain adaptation has been introduced to text image recognition tasks for serious domain shift problem, which can transfer knowledge from source domains to target ones. Moreover, in unsupervised domain adaptation for text recognition, there is no label information in the target domain to supervise the domain adaptation, especially at the character. Several existing methods regard a text image as a whole and perform only on global feature adaptation, neglecting local-level feature adaptation, i.e., characters. Others methods only focus their attention on word-level feature alignment while ignoring the categories of local-level characters. To address these issues, we propose a text recognition model via Dual adaptatiOn and Clustering, DOC for short. Regarding word-level, we construct a Global Discriminator for global feature adaptation to reduce text layout bias between source and target domains. Regarding character-level, we propose an Adaptive Feature Clustering (AFC) module, which can extract invariant character features through a local-level discriminator for adaptation. Moreover, it enhances the local-feature adaptation by a clustering scheme, which evaluates the feature adaptation by leveraging the knowledge from the source domain as much as possible. In this way, it can pay more attention to the differences in fine-grained characters. Extensive experiments on benchmark datasets demonstrate that our framework can achieve state-of-the-art performance.
引用
收藏
页码:9071 / 9081
页数:11
相关论文
共 50 条
[1]   What Is Wrong With Scene Text Recognition Model Comparisons? Dataset and Model Analysis [J].
Baek, Jeonghun ;
Kim, Geewook ;
Lee, Junyeop ;
Park, Sungrae ;
Han, Dongyoon ;
Yun, Sangdoo ;
Oh, Seong Joon ;
Lee, Hwalsuk .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :4714-4722
[2]   Text is Text, No Matter What: Unifying Text Recognition using Knowledge Distillation [J].
Bhunia, Ayan Kumar ;
Sain, Aneeshan ;
Chowdhury, Pinaki Nath ;
Song, Yi-Zhe .
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :963-972
[3]   Scene Text Visual Question Answering [J].
Biten, Ali Furkan ;
Tito, Ruben ;
Mafla, Andres ;
Gomez, Lluis ;
Rusinol, Marcal ;
Valveny, Ernest ;
Jawahar, C. V. ;
Karatzas, Dimosthenis .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :4290-4300
[5]   Scene Text Telescope: Text-Focused Scene Image Super-Resolution [J].
Chen, Jingye ;
Li, Bin ;
Xue, Xiangyang .
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :12021-12030
[6]   Text Recognition in the Wild: A Survey [J].
Chen, Xiaoxue ;
Jin, Lianwen ;
Zhu, Yuanzhi ;
Luo, Canjie ;
Wang, Tianwei .
ACM COMPUTING SURVEYS, 2021, 54 (02)
[7]  
Zeiler MD, 2012, Arxiv, DOI arXiv:1212.5701
[8]   Informative Feature Disentanglement for Unsupervised Domain Adaptation [J].
Deng, Wanxia ;
Zhao, Lingjun ;
Liao, Qing ;
Guo, Deke ;
Kuang, Gangyao ;
Hu, Dewen ;
Pietikainen, Matti ;
Liu, Li .
IEEE TRANSACTIONS ON MULTIMEDIA, 2022, 24 :2407-2421
[9]   Read Like Humans: Autonomous, Bidirectional and Iterative Language Modeling for Scene Text Recognition [J].
Fang, Shancheng ;
Xie, Hongtao ;
Wang, Yuxin ;
Mao, Zhendong ;
Zhang, Yongdong .
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :7094-7103
[10]  
Ganin Y, 2016, J MACH LEARN RES, V17