Spatial attention contrastive network for scene text recognition

被引:1
作者
Wang, Fan [1 ]
Yin, Dong [1 ]
机构
[1] Univ Sci & Technol China, Sch Informat Sci & Technol, Hefei, Peoples R China
关键词
contrastive learning; feature contrastive network; background suppression network; convolutional attention mechanism;
D O I
10.1117/1.JEI.31.4.043026
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
At present, most scene text recognition methods achieve good performance by training models on many synthetic data. However, many data lead to huge storage space and large amount of calculation. And there is a gap between synthetic and real data. To solve these problems, we use a few real data to train a novel proposed model named spatial attention contrastive network (SAC-Net). The SAC-Net consists of a background suppression network (BSNet), a feature encoder, an attention decoder (ADEer), and a feature contrastive network (FCNet). The BSNet based on U-Net is used to reduce the interference of background. For relatively low prediction accuracy brought by connectionist temporal classification, we design an ADEer to improve performance by using convolutional attention mechanism. Based on data augmentation, we design a FCNet which belongs to contrastive learning. Finally, our SAC-Net is almost equivalent to the state-of-the-art model trained on a few real data for word accuracy on six benchmark test datasets.
引用
收藏
页数:14
相关论文
共 40 条
  • [1] Bahdanau D, 2016, Arxiv, DOI arXiv:1409.0473
  • [2] PhotoOCR: Reading Text in Uncontrolled Conditions
    Bissacco, Alessandro
    Cummins, Mark
    Netzer, Yuval
    Neven, Hartmut
    [J]. 2013 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2013, : 785 - 792
  • [3] Chee Kheng Chng, 2019, 2019 International Conference on Document Analysis and Recognition (ICDAR). Proceedings, P1571, DOI 10.1109/ICDAR.2019.00252
  • [4] AON: Towards Arbitrarily-Oriented Text Recognition
    Cheng, Zhanzhan
    Xu, Yangliu
    Bai, Fan
    Niu, Yi
    Pu, Shiliang
    Zhou, Shuigeng
    [J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 5571 - 5579
  • [5] Focusing Attention: Towards Accurate Text Recognition in Natural Images
    Cheng, Zhanzhan
    Bai, Fan
    Xu, Yunlu
    Zheng, Gang
    Pu, Shiliang
    Zhou, Shuigeng
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 5086 - 5094
  • [6] Cho K., 2014, P C EMP METH NAT LAN, P1724
  • [7] Glorot X, 2011, P 14 INT C ART INT S, V15, P315, DOI DOI 10.1002/ECS2.1832
  • [8] Graves A, 2012, STUD COMPUT INTELL, V385, P1, DOI [10.1007/978-3-642-24797-2, 10.1162/neco.1997.9.1.1]
  • [9] Graves A., 2006, PROC 23 INT C MACHIN, P369
  • [10] Synthetic Data for Text Localisation in Natural Images
    Gupta, Ankush
    Vedaldi, Andrea
    Zisserman, Andrew
    [J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 2315 - 2324