SMILE: SEQUENCE-TO-SEQUENCE DOMAIN ADAPTATION WITH MINIMIZING LATENT ENTROPY FOR TEXT IMAGE RECOGNITION

被引:7
作者
Chang, Yen-Cheng [1 ]
Chen, Yi-Chang [1 ]
Chang, Yu-Chuan [1 ]
Yeh, Yi-Ren [2 ]
机构
[1] E SUN Financial Holding Co Ltd, Taipei, Taiwan
[2] Natl Kaohsiung Normal Univ, Dept Math, Kaohsiung, Taiwan
来源
2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP | 2022年
关键词
domain adaptation; sequence-to-sequence; entropy minimization; self-paced learning;
D O I
10.1109/ICIP46576.2022.9897599
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Excellent text recognition results have been obtained by training recognition models with synthetic images. However, recognizing text from real-world images still faces challenges due to the domain shift between synthetic and real-world text images. One strategy to eliminate this domain difference without manual annotation is unsupervised domain adaptation (UDA). Due to the characteristics of sequential labeling tasks, most popular UDA methods cannot be directly applied to text recognition. To tackle this problem, we proposed a UDA method that minimizes latent entropy on sequence-to-sequence attention-based models with class-balanced self-paced learning. Experimental results show that our proposed framework achieves better recognition results than the existing methods on most UDA text recognition benchmarks. All codes are publicly available(1).
引用
收藏
页码:431 / 435
页数:5
相关论文
共 26 条
[1]  
[Anonymous], 2007, A Kernel Approach to Comparing Distributions
[2]   What Is Wrong With Scene Text Recognition Model Comparisons? Dataset and Model Analysis [J].
Baek, Jeonghun ;
Kim, Geewook ;
Lee, Junyeop ;
Park, Sungrae ;
Han, Dongyoon ;
Yun, Sangdoo ;
Oh, Seong Joon ;
Lee, Hwalsuk .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :4714-4722
[3]  
Chen Y., 2021, arXiv
[4]   Focusing Attention: Towards Accurate Text Recognition in Natural Images [J].
Cheng, Zhanzhan ;
Bai, Fan ;
Xu, Yunlu ;
Zheng, Gang ;
Pu, Shiliang ;
Zhou, Shuigeng .
2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :5086-5094
[5]  
Ganin Y, 2015, PR MACH LEARN RES, V37, P1180
[6]  
Grandvalet Y., 2005, NeurIPS
[7]   Synthetic Data for Text Localisation in Natural Images [J].
Gupta, Ankush ;
Vedaldi, Andrea ;
Zisserman, Andrew .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :2315-2324
[8]  
Jaderberg M., 2014, ARXIV
[9]  
JianfengWang Xiaolin, 2017, NIPS
[10]  
Kang L, 2020, IEEE WINT CONF APPL, P3491, DOI 10.1109/WACV45572.2020.9093392