Robust Text Image Recognition via Adversarial Sequence-to-Sequence Domain Adaptation

被引：25

作者：

Zhang, Yaping ^{[1
,2
]}

Nie, Shuai ^{[1
]}

Liang, Shan ^{[1
]}

Liu, Wenju ^{[1
]}

机构：

[1] Chinese Acad Sci CASIA, Natl Lab Pattern Recognit, Inst Automat, Beijing 100190, Peoples R China

[2] Univ Chinese Acad Sci UCAS, Sch Artificial Intelligence, Beijing, Peoples R China

来源：

IEEE TRANSACTIONS ON IMAGE PROCESSING | 2021年 / 30卷

关键词：

Text recognition; Image recognition; Adaptation models; Task analysis; Character recognition; Training; Visualization; Sequence-to-sequence; domain adaptation; text image recognition;

D O I：

10.1109/TIP.2021.3066903

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Robust text reading is a very challenging problem, due to the distribution of text images changing significantly in real-world scenarios. One effective solution is to align the distribution between different domains by domain adaptation methods. However, we found that these methods might struggle when dealing sequence-like text images. An important reason is that conventional domain adaptation methods strive to align images as a whole, while text images consist of variable-length fine-grained character information. To address this issue, we propose a novel Adversarial Sequence-to-Sequence Domain Adaptation (ASSDA) method to learn "where to adapt" and "how to align" the sequential image. Our key idea is to mine the local regions that contain characters, and focus on aligning them across domains in an adversarial manner. Extensive text recognition experiments show the ASSDA could efficiently transfer sequence knowledge and validate the promising power towards the various domain shift in the real world applications.

引用

页码：3922 / 3933

页数：12

共 46 条

[1]

[Anonymous], 2017, P 2017 ACM MULT C, DOI DOI 10.1145/3123266.3123292

[2] Multi-Content GAN for Few-Shot Font Style Transfer [J].

Azadil, Samaneh ;

Fisher, Matthew ;

Kim, Vladimir ;

Wang, Zhaowen ;

Shechtman, Eli ;

Darrell, Trevor .

2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :7564-7573

[3] What Is Wrong With Scene Text Recognition Model Comparisons? Dataset and Model Analysis [J].

Baek, Jeonghun ;

Kim, Geewook ;

Lee, Junyeop ;

Park, Sungrae ;

Han, Dongyoon ;

Yun, Sangdoo ;

Oh, Seong Joon ;

Lee, Hwalsuk .

2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :4714-4722

[4]

Bluche T, 2016, ADV NEUR IN, V29

[5] Unsupervised Pixel-Level Domain Adaptation with Generative Adversarial Networks [J].

Bousmalis, Konstantinos ;

Silberman, Nathan ;

Dohan, David ;

Erhan, Dumitru ;

Krishnan, Dilip .

30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :95-104

[6] Progressive Feature Alignment for Unsupervised Domain Adaptation [J].

Chen, Chaoqi ;

Xie, Weiping ;

Huang, Wenbing ;

Rong, Yu ;

Ding, Xinghao ;

Huang, Yue ;

Xu, Tingyang ;

Huang, Junzhou .

2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :627-636

[7] AON: Towards Arbitrarily-Oriented Text Recognition [J].

Cheng, Zhanzhan ;

Xu, Yangliu ;

Bai, Fan ;

Niu, Yi ;

Pu, Shiliang ;

Zhou, Shuigeng .

2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :5571-5579

[8] Focusing Attention: Towards Accurate Text Recognition in Natural Images [J].

Cheng, Zhanzhan ;

Bai, Fan ;

Xu, Yunlu ;

Zheng, Gang ;

Pu, Shiliang ;

Zhou, Shuigeng .

2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :5086-5094

[9]

Ganin Y, 2015, PR MACH LEARN RES, V37, P1180

[10] Synthetic Data for Text Localisation in Natural Images [J].

Gupta, Ankush ;

Vedaldi, Andrea ;

Zisserman, Andrew .

2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :2315-2324

← 1 2 3 4 5 →