SMILE: SEQUENCE-TO-SEQUENCE DOMAIN ADAPTATION WITH MINIMIZING LATENT ENTROPY FOR TEXT IMAGE RECOGNITION

被引:6
作者
Chang, Yen-Cheng [1 ]
Chen, Yi-Chang [1 ]
Chang, Yu-Chuan [1 ]
Yeh, Yi-Ren [2 ]
机构
[1] E SUN Financial Holding Co Ltd, Taipei, Taiwan
[2] Natl Kaohsiung Normal Univ, Dept Math, Kaohsiung, Taiwan
来源
2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP | 2022年
关键词
domain adaptation; sequence-to-sequence; entropy minimization; self-paced learning;
D O I
10.1109/ICIP46576.2022.9897599
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Excellent text recognition results have been obtained by training recognition models with synthetic images. However, recognizing text from real-world images still faces challenges due to the domain shift between synthetic and real-world text images. One strategy to eliminate this domain difference without manual annotation is unsupervised domain adaptation (UDA). Due to the characteristics of sequential labeling tasks, most popular UDA methods cannot be directly applied to text recognition. To tackle this problem, we proposed a UDA method that minimizes latent entropy on sequence-to-sequence attention-based models with class-balanced self-paced learning. Experimental results show that our proposed framework achieves better recognition results than the existing methods on most UDA text recognition benchmarks. All codes are publicly available(1).
引用
收藏
页码:431 / 435
页数:5
相关论文
共 26 条
  • [1] [Anonymous], 2007, A Kernel Approach to Comparing Distributions
  • [2] What Is Wrong With Scene Text Recognition Model Comparisons? Dataset and Model Analysis
    Baek, Jeonghun
    Kim, Geewook
    Lee, Junyeop
    Park, Sungrae
    Han, Dongyoon
    Yun, Sangdoo
    Oh, Seong Joon
    Lee, Hwalsuk
    [J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 4714 - 4722
  • [3] Chen Y., 2021, ARXIV
  • [4] Focusing Attention: Towards Accurate Text Recognition in Natural Images
    Cheng, Zhanzhan
    Bai, Fan
    Xu, Yunlu
    Zheng, Gang
    Pu, Shiliang
    Zhou, Shuigeng
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 5086 - 5094
  • [5] Ganin Y, 2015, PR MACH LEARN RES, V37, P1180
  • [6] Grandvalet Y., 2005, CAP
  • [7] Synthetic Data for Text Localisation in Natural Images
    Gupta, Ankush
    Vedaldi, Andrea
    Zisserman, Andrew
    [J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 2315 - 2324
  • [8] Jaderberg M., 2014, WORKSHOP DEEP LEARNI
  • [9] JianfengWang Xiaolin, 2017, NIPS
  • [10] Kang L, 2020, IEEE WINT CONF APPL, P3491, DOI 10.1109/WACV45572.2020.9093392