Data Augmentation for Scene Text Recognition

被引:23
作者
Atienza, Rowel [1 ]
机构
[1] Univ Philippines, Elect & Elect Engn Inst, Quezon City, Philippines
来源
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW 2021) | 2021年
关键词
D O I
10.1109/ICCVW54120.2021.00181
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Scene text recognition (STR) is a challenging task in computer vision due to the large number of possible text appearances in natural scenes. Most STR models rely on synthetic datasets for training since there are no sufficiently big and publicly available labelled real datasets. Since STR models are evaluated using real data, the mismatch between training and testing data distributions results into poor performance of models especially on challenging text that are affected by noise, artifacts, geometry, structure, etc. In this paper, we introduce STRAug which is made of 36 image augmentation functions designed for STR. Each function mimics certain text image properties that can be found in natural scenes, caused by camera sensors, or induced by signal processing operations but poorly represented in the training dataset. When applied to strong baseline models using RandAugment, STRAug significantly increases the overall absolute accuracy of STR models across regular and irregular test datasets by as much as 2.10% on Rosetta, 1.48% on R2AM, 1.30% on CRNN, 1.35% on RARE, 1.06% on TRBA and 0.89% on GCRNN. The diversity and simplicity of API provided by STRAug functions enable easy replication and validation of existing data augmentation methods for STR. STRAug is available at https://github.com/roatienzas/straug.
引用
收藏
页码:1561 / 1570
页数:10
相关论文
共 44 条
[1]  
Aberdam Aviad, 2020, ARXIV201210873
[2]  
[Anonymous], 2016, CVPR, DOI DOI 10.1109/CVPR.2016.452
[3]  
[Anonymous], 2015, Tiny ImageNet Visual Recognition Challenge., DOI DOI 10.1109/ICCV.2015.123
[4]  
[Anonymous], 2018, INT C KNOWL DISC DAT, DOI DOI 10.1145/3219819.3219861
[5]  
Atienza R, 2021, INT C DOC AN REC
[6]   What Is Wrong With Scene Text Recognition Model Comparisons? Dataset and Model Analysis [J].
Baek, Jeonghun ;
Kim, Geewook ;
Lee, Junyeop ;
Park, Sungrae ;
Han, Dongyoon ;
Yun, Sangdoo ;
Oh, Seong Joon ;
Lee, Hwalsuk .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :4714-4722
[8]   Priority Queues with Fractional Service for Tiered Delay QoS [J].
Chang, Gary ;
Lee, Chung-Chieh .
FUTURE INTERNET, 2016, 8 (01)
[9]  
Chen P., 2020, ARXIV
[10]  
Chen X, 2020, ARXIV PREPRINT ARXIV